\unnumbered

[1,3]\fnmZachary \surYang

1]\orgdivSchool of Computer Science, \orgnameMcGill University

2]\orgdivPolitical Science, \orgnameMcGill University

3]\orgnameMontreal Institute for Learning Algorithms

4]\orgdivDepartment of Political Science, \orgnameUniversity of Montreal

Regional and Temporal Patterns of Partisan Polarization during the COVID-19 Pandemic in the United States and Canada

[email protected]    \fnmAnne \surImouza    \fnmMaximilian \surPuelma Touzel    \fnmCécile \surAmadoro    \fnmGabrielle \surDesrosiers-Brisebois    \fnmKellin \surPelrine    \fnmSacha \surLevy    \fnmJean-François \surGodbout    \fnmReihaneh \surRabbany [ [ [ [
Abstract

Public health measures were among the most polarizing topics debated online during the COVID-19 pandemic. Much of the discussion surrounded specific events, such as when and which particular interventions came into practise. In this work, we develop and apply an approach to measure subnational and event-driven variation of partisan polarization and explore how these dynamics varied both across and within countries. We apply our measure to a dataset of over 50 million tweets posted during late 2020, a salient period of polarizing discourse in the early phase of the pandemic. In particular, we examine regional variations in both the United States and Canada, focusing on three specific health interventions: lockdowns, masks, and vaccines. We find that more politically conservative regions had higher levels of partisan polarization in both countries, especially in the US where a strong negative correlation exists between regional vaccination rates and degree of polarization in vaccine related discussions. We then analyze the timing, context, and profile of spikes in polarization, linking them to specific events discussed on social media across different regions in both countries. These typically last only a few days in duration, suggesting that online discussions reflect and could even drive changes in public opinion, which in the context of pandemic response impacts public health outcomes across different regions and over time.

keywords:
Polarization, COVID-19, Social Media, Computational Social Science

1 Introduction

Partisan polarization is increasingly prevalent in democracies around the world [1]. In the United States, the level of opposition between Democrats and Republicans has been steadily growing for decades [2, 3] and reached unprecedented heights during the 2020 presidential election [1, 4]. This polarization even affected how individuals responded to the COVID-19 pandemic by influencing their assessment of the dangers posed by the virus and their response to public health measures [5, 6, 7]. Several studies have now confirmed that supporters of the Democratic party were more likely to follow social distancing measures [8, 9, 10], wear masks [11, 12], and get vaccinated [13, 14, 15] when compared to their Republican counterparts. This polarizing trend among partisans is not limited to the United States [16, 17]. Other countries, like Canada, have also experienced the rapid politicization of pandemic responses, where researchers found that supporters of the Liberal Party were more likely to follow COVID-19 guidelines than supporters of the Conservative Party or the populist People’s Party [18, 19].

While countries like the United States and Canada adopted strategies to coordinate efforts in addressing the pandemic at the national level, there was still significant variation in both the amount and types of public health interventions introduced by subnational governments. For instance, Canadian provinces implemented different policies, ranging from comprehensive lockdowns and school closures to more targeted guidelines focusing on specific populations [20]. Similar variation can be observed in the United States, where some states implemented strict lockdown orders and mask mandates, while others refused to limit social distancing [21]. Much like at the national level, these different regional policies also became rapidly politicized along partisan lines [17], especially on social media platforms, where the politicization of the COVID-19 pandemic largely unfolded [22, 23, 10, 24].

Numerous studies have now confirmed that online discussions surrounding the pandemic [25, 9, 26, 27] exhibited clear regional patterns characterized by the same partisan animosity that impacted the heterogeneous implementation of public health measures [25] and the resulting epidemiological outcomes [9]. Additional research highlights that these partisan divisions also contributed to the increasing polarization observed on social media [25, 27, 24]. The intense reactions from both supporters and opponents of public health measures [28] implies that public opinion could have significantly been influenced by local political dynamics and the geography of the pandemic [29]. This regional heterogeneity provides us with a unique opportunity to study polarization around specific events, topics, and regions, to understand how various factors affected compliance to COVID-19 guidelines. However, reliably measuring the polarization of public discourse at more fine-grained resolution is a challenge, even with the large quantity of human-generated text with extensive meta-data available on digital platforms.

In this work, we propose a solution to this measurement problem by introducing a comprehensive approach to better understand the geographic and event-driven variation of online partisan polarization of COVID-19 discussions within American states and Canadian provinces. We examine variations in public discourse as contained on Twitter (now X) to determine how polarization is related to: (1) the ideological leanings of different regions; (2) the amount of conspiracy theory-related messages that users have been exposed to; and (3) vaccination data.

The paper is organized as follows. First, we briefly outline our approach to region- and time-resolved polarization data from Twitter (X). In this section, we also describe our machine learning method to classify users as conservative or liberal and justify our choice of topic-conditioned language dissimilarity as a proxy for partisan polarization. Next, we present the results of applying our approach to a large-scale dataset we collected in 2020, filtered through three prominent pandemic-related public health interventions: lockdowns, masks, and vaccines. Our findings indicate that conservative regions in both countries exhibited higher polarization levels on these topics overall. We also find strong negative correlation between vaccination rates in different U.S. regions and the level of polarization in their online discussions related to vaccines. We close with a discussion of limitations of the approach and promising new areas of application.

2 Approach

Refer to caption
Figure 1: Overview of the proposed method to estimate partisan polarization over date, region, and topic (top), as well as how to analyze this data by collapsing it over any of those three dimensions (bottom). We studied the topics of lockdowns, masks, and vaccines.

We developed a method to measure geographically-resolved partisan polarization over time from large-scale social media message datasets (see fig. 1). The language of political discussions across socio-demographic groups can vary significantly [30], each having their own lexicon, so dissimilar language on its own does not imply polarized positions. However, when conditioning on discussion of the same, contentious topic, the dissimilarity of the language used by different demographics is more likely to reflect alternative semantic understandings of that specific topic, which we assume are highly correlated with polarization. This correlation is weakened by linguistic differences not captured by the particular definition of semantic separation used, so the latter is only a noisy proxy of polarization strength. That said, we expect that a stronger correlation between semantic separation and polarization exists when language is represented in a more expressive model. Inspired by the demonstrated capacity of modern vector embeddings to represent the semantics of words, our approach focuses on transforming the sentences of social media posts using RoBERTa, a powerful open-source, language-embedding model [31]. As an indicator for polarization, we then measure dissimilarity by how far apart the tweets of left and right-leaning users are in this embedding space. In particular, we use the C-index [32], a robust clustering measure based on the average of pairwise distances of embedded partisan users within a partisan group relative to the average of the largest and smallest distances overall. To label partisans in our data, we developed and validated a machine learning method that identifies users as conservative or liberal on the ideological spectrum by cross-indexing multiple metadata sources. We also developed and validated a method to geolocate users to resolve polarization’s geographic heterogeneity. Details of these components of the approach can be found in the Methods section.

2.1 Application to late 2020 pandemic discourse in the United States & Canada

We collected a large-scale dataset of COVID-19 political discussions on Twitter (X) occurring between October \nth9, 2020, to January \nth4, 2021, comprising about 46.6 million tweets linked to Canada and 12.5 million tweets linked to the United States. We geolocated users based on their provided location and classified them by their declared party affiliation. Specifically, we include identifiers for the two major liberal and conservative ideological divisions in each country: the left (Liberal Party, New Democratic Party, and Green Party) and the right (Conservative Party and People’s Party) for Canada; and the Democratic (left) and Republican (right) parties for the United States. Using official population census and election results, we verify that these data provide a politically balanced set of users in the different regions of these two countries. For each of the users, we then compute a vector representation for the language they used in their social media messages during this period. We condition on region, time, and topic and for each combination compute the value of the C-index as a proxy for polarization. To narrow the content of the messages analyzed, we focus on three specific topics of discussion: lockdowns, masks and vaccines. These topics were chosen because of their salience for polarized discourse around the pandemic [27, 33, 34], and because they span different types of interventions (group behaviour, individual behaviour, and medicine, respectively). Based on these measurements, we then compare the polarization observed in different American states and Canadian provinces over time for each of the three topics. We also look into how polarization is correlated with epidemiological data and conspiracy-related content. We refer the reader to the Methods section for further details.

3 Results

We organize the presentation of results as follows. First, we report the geographical trends of the observed partisan polarization in the United States and Canada and confirm that conservative states and provinces display more polarized online discourse. Next, we highlight the correlation between partisan polarization on the topic of vaccines and the vaccination rates found across U.S. states. We then present our event-based analysis of the temporal patterns of polarization at the national level in both countries and report correlations between polarization and vaccination data, as well as the volume of conspiracy-related content on Twitter (X). Finally, we examine the different peaks in polarization and explain how they relate to various polarizing events.

Refer to caption
(a) Lockdown Polarization
Refer to caption
(b) Mask Polarization
Refer to caption
(c) Vaccine Polarization
Refer to caption
(d) % of Conspiracy-related Tweets
Figure 2: Regional distribution of partisan polarization in the United States on three key topics of Lockdown (a), Mask (b), and Vaccines (c). Color intensity from light to dark gives the amount of polarization measured weekly between October \nth11, 2020 to January \nth3, 2021 and then averaged over the 12 weeks. We also report the average weekly percentage of conspiracy-related tweets that are posted from users in each region in panel (d).
Refer to caption
(a) Lockdowns Polarization
Refer to caption
(b) Masks Polarization
Refer to caption
(c) Vaccines Polarization
Refer to caption
(d) % of Conspiracy-related Tweets
Figure 3: Regional distribution of partisan polarization in Canada on three key topics of Lockdowns (a), Masks (b), and Vaccines (c). The polarization is measured weekly between October \nth11, 2020 to January \nth3, 2021 and the averaged over 12 weeks is used for this plot. We also report the average weekly percentage of conspiracy-related tweets that are posted from users in each region (d). Provinces and territory boundaries are colored based on the number of users we had in our data from those regions, which indicates the support for our measurement: Light-grey for less than 100 users, grey for between 100 and 1,000 users and black for greater than 1,000 users.

3.1 Regional Variation in Partisan Polarization

Our analysis begins by visualizing the geography of partisan polarization in fig. 2 and fig. 3 for the United States and Canada. The regional heterogeneity in the amount of polarization observed over different topics is apparent in both countries. We also see heterogeneity in the amount of conspiracy-related tweets shown in fig. 2d and fig. 3d for both countries, respectively.

Refer to caption
Figure 4: Ranking of American states partisan polarization per topic and overall. Ranking of 1 signifies the highest average weekly polarization between October \nth11, 2020 to January \nth3, 2021 (12 weeks). State names are colored based on the vote margin for the conservative party from the 2020 United States Presidential Election (Conservative Party: Republican Party; Liberal Party: Democratic Party).
Refer to caption
Figure 5: Partisan polarization ranking of Canadian provinces and territories per topic and overall. A ranking of 1 signifies the highest average weekly polarization between October \nth11, 2020 to January \nth3, 2021 (12 weeks). Province or territory names are colored (red to blue) based on the vote margin for the conservative party family from Canada’s 2019 Federal Election (Liberal Party Family: Liberal, New Democratic Party, Green; Conservative Party Family: Conservative, People’s Party). Line colors have a transparency to reflect the support for the measurement, based on the number of users in that region.
Refer to caption
(a) Correlation between Polarization Score and Vote Margin for Conservative Party. Colors (blue to red) are conservative party vote margin (same as fig. 4 and fig. 5). Significant correlation between Polarization Score and Vote Margin is found for the US discourse on masks and on vaccines for which the respective Pearson r𝑟ritalic_r correlation and p𝑝pitalic_p-value is shown.
Refer to caption
(b) Relation between vaccines polarization and vaccination rates in the United States. Color (blue to red) is again the respective conservative party vote margin from the 2020 U.S. Presidential Election. The correlation is -0.77 with CI = [-0.86, -0.62] (n = 51, p = 6.97e-11).
Figure 6:

Next, we analyze how this heterogeneity varies with the partisan leanings found in each region by analyzing election voting patterns (the 2020 presidential election in the case of the United States and the 2019 federal election for Canada). Our first observation is that conservative states and provinces show higher levels of polarization compared to their liberal counterparts. To display results over all covered regions, we show the polarization rankings for American states and Canadian provinces in fig. 4 and fig. 5, respectively. This ranking is applied separately to each of the three topics and is based on their weekly polarization averaged over the 12-week period. An additional fourth ranking labelled overall is shown and gives the average over the three topics. Each region is associated with a color graded from blue to red based on the vote margin for the Republican party (US) or the conservative party family (Canada) obtained from the votes reported in the most recent election in their corresponding country. In these figures, a blue to red color gradient for conservative to progressive is used such that the names of predominantly conservative/Republican regions appear in red, predominantly liberal/Democratic regions in blue, and mixed or less definitive regions in purple. Referring to fig. 4, in the United States we can see clearly that conservative states are more polarized compared to liberal states overall and specifically on discussions related to masks and vaccines. The ranking is more mixed in the discussions about lockdown measures with outliers from both liberal and conservative states; namely Idaho, Alabama, and Arkansas showing the least polarization, and Delaware (ranked 1st), Colorado (ranked 17th), and New Jersey (ranked 15th) showing higher values. The expected relationship between pandemic response and state partisanship is however still present for lockdown discussions, with liberal states such as Vermont (ranked 41st) and Massachusetts (ranked 43rd) displaying less polarization compared to more conservative states such as Mississippi (ranked 2nd), North Dakota (ranked 3rd), and Oklahoma (ranked 4th).

Looking now at Canada fig. 4, we find that Alberta, a conservative province, shows higher polarization compared to Ontario, and British Columbia (among the Canadian provinces with the most number of social media users). Quebec is overall the highest ranked province. Although the pandemic was highly polarized in Quebec—e.g., with violent protests [35]—we want to acknowledge the limitation of our study, which was focused on the English language; in Quebec, the main language is French whereas only English tweets were included in our analysis.

Finally, the correlation between polarization and the partisan vote margin is more clearly represented in the scatter between the two, shown for both countries in fig. 6(a). We see a strong and significant correlation between Republican vote share in the United States and the polarization index around masks and vaccines discourse, but not lockdowns. The remaining associations (Lockdown for US, and all topics for Canada) are, however, insignificant (for Canada this is in part due to the relatively small number of regions).

The strong correlation that we observe between the polarization score for discourse around vaccines in conservative-leaning states follows the well-known negative correlation between Republican vote share and vaccination rates [36]. Since vaccines were not available yet over this time period, we nevertheless present a comparison using official vaccination rates measured for different states by the U.S. Centers for Disease Control and Prevention for a similar period of time one year later, after the vaccines were rolled out (i.e. October \nth11, 2021 to January \nth3, 2022). Averaging on a weekly basis, we confirm this correlation in fig. 6(b), where we observe that vaccine polarization is strongly negatively correlated to vaccination rates in the different American states. We did not observe a similar pattern in Canada, due to small sample size and the implementation of vaccine mandates. While disentangling the causal relationships among conservative vote margin, polarization score, and vaccination rate is not possible here, the results suggest that polarized discourse played a role in shaping the highly heterogeneous vaccination rates across the U.S.

3.2 Temporal Variation in Partisan Polarization

We next focus on the temporal trends of daily partisan polarization at the national level for each topic and overall, as displayed in fig. 7 for the United States and Canada, respectively. In these figures, the value of the metric fluctuates rapidly on the timescale of days. This is on the faster end of the range of timescales found in other topic tracking studies, e.g., [37]. These short timescales are consistent with our assumption that language adapts quickly in rapid anticipation of or as an immediate response to specific events. In particular, we considered two kinds of events. First, we preselected political and vaccine-related events (shown in table 1(a) and as vertical lines in fig. 7). These provide the scaffold for the socio-political trajectory of each country related to political discourse and pandemic response. Second, we detected highly polarized events through analysis of the highest two peaks in polarization (shown in table 1(b) and as red circles in fig. 7). We also show in the figures the tweet volume as a relative indicator of day-by-day reliability of the estimation of the polarization score. While we do not have direct causal evidence linking a highlighted peak in the polarization score to a specific event, we do find that many of the largest polarization peaks occur around highly contentious events related to each country’s specific context (table 1(b)). In the following two sections, we summarize these events and discuss how they relate to the topic for which the polarization simultaneously peaks.

(a) Major political and pandemic-related events in each country such as wheen the FDA (U.S. Food & Drug Administration), and PHAC (Public Health Agency of Canada) approved vaccines.
Date Event Country
Nov. 3 US National Election US
Oct. 24 BC General Election Canada
Oct. 26 Saskatchewan General Election Canada
Dec. 8 States resolve controversies US
Dec. 9 PHAC approves Pfizer vaccine Canada
Dec. 11 FDA approves vaccines US
Electoral votes submitted US
Dec. 14 Vaccination begins Canada
Dec. 20 Moderna vaccine distributed US
Election votes arrive US
Dec. 23 PHAC approves Moderna Vaccine Canada
(b) Polarization peaks and their corresponding events. For each topic, we analyzed the two highest peaks, and inferred the content discussed on those peaks.
United States Canada
Topic Date Polarizing Event Date Polarizing Event
Nov. 1 Viral tweet by Trump Oct. 17 Toronto Mask Measures Protest
Lockdown Nov. 21 Unidentified topic Oct. 29 Calgary Mask Measures Protest
Oct. 21 Viral tweet by Trump Oct. 12 Unidentifed topic
Masks Nov. 14 Biden proposes mandates Nov. 14 PHAC recommends masks
Dec. 20 Moderna vaccine distributed Dec. 20 Moderna distributed in U.S.
Vaccines Dec 22 Biden gets vaccinated Dec. 23 PHAC approves Moderna
Table 1: Major dates and peaks within the United States and Canada during 2020.
United States Polarization Timecourse

The left column of fig. 7 reports the daily polarization measured for the three key topics of lockdowns, masks, and vaccines. Peaks in polarization on the lockdown topic in fig. 7(a) may correspond to partisan differences in public support (or discontent) and discourse surrounding COVID-19 measures. In the days leading up to the 2020 Presidential Election on November 3rd, a pillar of President Trump’s campaign messaging on the pandemic characterized lockdowns as tyranny and economic repression [38]. For example, on November \nth1, 2020, the date of the second-largest peak, Trump made a highly controversial claim by stating that the election was a choice between implementing deadly lockdown measures supported by Biden or an efficient end to the COVID-19 crisis with a safe vaccine [39]. Trump also made other similar claims on Twitter (X) during this period, e.g., when he said (sic): “Biden wants to LOCK DOWN our Country, maybe for years. Crazy! There will be NO LOCKDOWNS. The great American Comeback is underway!!!” [38].

Next, contentious debates related to masks were found coincident with peaks in polarization, as shown in fig. 7(c). For example, while we did not find an event external to social media on October \nth31, 2020, the date of the highest peak, we did find that the most retweeted tweet by Democrats on that day was “RT @JoeBiden: Be a patriot. Wear a mask.”. This, in turn, generated strong responses that day from Republicans with the third most retweeted tweet within this group: “RT @RealBrysonGray: There’s literally nothing patriotic about being so scared of a virus with a 99.9…”. This is then possibly an example of influencer post-driven, rather than real world event-driven polarization. Another set of divisive messages were observed on November \nth14, 2020, the next highest peak, after presidential candidate Biden proposed mandatory mask mandates, and South Dakota Governor Kristi Noem announced her opposition to this measure; nearly half of all the top retweets referred to Noem’s statement.

Finally, fig. 7(e) reports trends in polarization around the topic of vaccines. Here, some of the peaks observed are simultaneous with important events surrounding COVID-19 vaccine efficacy. We see that the second-largest peak occurred on December \nth22, 2020, a day after Biden received his first COVID-19 vaccine shot [40]. The most retweeted tweet for both partisan groups that day was “RT @JoeBiden: Today, I received the COVID-19 vaccine. To the scientists and researchers who worked tirelessly to make this possible - than…”. However, while supporters of Biden congratulated him, by tweeting messages like “RT @YAFBiden: And just like that, @JoeBiden has received the COVID-19 vaccine!”, opponents instead promoted pro-Trump messages, e.g. “RT @TheLeoTerrell: Finally a @JoeBiden confession. He finally gave credit to @realDonaldTrump and #OperationWarpSpeed. It’s about time.”.

Refer to caption
(a) US Daily Lockdown Polarization
Refer to caption
(b) Canada Daily Lockdown Polarization
Refer to caption
(c) US Daily Mask Polarization
Refer to caption
(d) Canada Daily Masks Polarization
Refer to caption
(e) US Daily Vaccine Polarization
Refer to caption
(f) Canada Daily Vaccine Polarization
Figure 7: Daily trends of partisan polarization in the United States and Canada from October \nth9, 2020 to January \nth3, 2021. The vertical dashed lines denote pre-selected political and vaccine-related events as explained in the text. In addition to the polarization measure (purple line), we also report the tweet volume, in log-scale, on the corresponding topic (yellow line) per day which denotes the size of support for our measurement.
Canadian Polarization Timecourse

The right column of fig. 7 shows the daily polarization measured for the three key topics of lockdowns, masks, and vaccines. The pre-selected events in the Canadian timeline (table 1(a)) are marked as vertical lines in the figure.

In fig. 7(b), on the topic of lockdowns, we observe the highest peak on October \nth17, 2020, which coincides with the Toronto anti-mask protest, a large demonstration where thousands of protesters rallied against COVID-19 lockdown measures. The second-highest peak is observed on November \nth29, 2020, when the national news reported a Calgary Mask Measures protest on the preceding day [41].

The highest polarization peak on mask-related tweets is found on November \nth14, 2020, as seen in fig. 7(d), coincident with the peak in the US plot 7(c) mentioned earlier. This event was discussed by the conservative-Party family users, showing how partisan discourse in the U.S. might be driving some polarization in Canada.

Looking at the polarization of discussions about vaccines in fig. 7(f), we also observe the highest peaks are in response to key vaccine-related events: The two highest peaks in polarization observed on December \nth20 and \nth23, 2020 coincided with the distribution of the Moderna COVID-19 vaccines in the U.S. and Health Canada’s approval of the Moderna vaccine [42], respectively. While the former is an event associated with the United States, it led to discussions in Canada about vaccine prioritization and availability [43]. On the \nth20, top retweets by liberal Party Family users focused on news of the Republican politicians being first in line for the vaccine, while conservative Party Family users retweeted more diverse anti-vaccine sentiment. On December \nth23, 2020, the top retweets were strong sentiments in support of and in opposition to the approval.

Aggregate Polarization
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 8: Daily aggregated partisan polarization. For the U.S. (a, b) and Canada (c, d), polarization is aggregated over topic by averaging over the values. We show pandemic-related new cases and deaths in background for reference. vent-triggered average polarization for identified events listed in table 1(b). Shaded region denotes standard deviation over the 5 events for each country.

To complement the granular analysis presented above, we also evaluated the measure’s overall responsiveness to polarizing events. In particular, we computed an average of the polarization score over topics (shown in fig. 8a,b) and then performed event-triggered averaging around such events, to show how the metric varies in time before and after these particular dates on average. This aggregate result (shown in fig. 8c) confirms a fast (on the order of days) and largely symmetric rise-and-decay profile around these polarization peaks.

3.3 The Relationship between Conspiracy and Polarization

Finally, we explored the relationship between conspiracy discourse and polarization using our time-resolved measurements of the number of conspiracy-related tweets. In particular, we compared it with the time course of the aggregate daily polarization presented in the previous section. The profiles broken down by progressive and liberal partisanship for U.S. and Canada are shown in fig. 9(c) and fig. 9(a), respectively. For both countries, conservative partisans tweet conspiracy-related content in higher numbers than progressive partisans. The correlation with aggregate daily polarization for the U.S. and Canada (fig. 8(a) and fig. 8(c)) is shown in fig. 9(b) and fig. 9(d), respectively. For the U.S., we find a small but significant negative correlation with polarization.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 9: Relation between the volume of conspiracy related content and the observed partisan polarization in the United States and Canada. In the left column, we report the volume of conspiracy related tweets posted by users for the United States (a: affiliated with the Democrat and Republican party) and Canada (c: affiliated with the liberal (left) Party Family (LPF)—Liberal, New Democratic Party, Green, and the conservative (right) Party Family (RPF)—Conservative, People’s Party for Canada). On the right, we show the relation between daily partisan polarization summed over the different topics and the overall volume of conspiracy tweets. In the United States (b), we find there is a statistically significant correlation -0.247 with CI=[-0.448,-0.023] (n = 88, p=0.031) between these measures. In Canada (d), we find that there is no statistically significant correlation of 0.023 with CI=[-0.187,0.231] (n = 88, p=0.831) between these measures.

4 Discussion

This article investigated regional and event-triggered variation in partisan division within social media debates surrounding the introduction of COVID-19 public health measures across American states and Canadian provinces. Our computational analysis was centered around quantifying partisan polarization by analyzing the language used in millions of online messages from users affiliated with different political parties. In particular, we focused on Twitter (X) discussions related to three key public health interventions during the early phases of the COVID-19 pandemic: lockdowns, masks, and vaccines, as well as tracking the volume of conspiracy-related tweets. Our analysis explored the geographic heterogeneity of polarization and identified political events that likely influenced public opinion over time.

Like several other studies before us [e.g., 44, 17, 45, 46], we found that more right leaning states and provinces exhibited greater partisan divisions around COVID-19 on Twitter (X), in particular concerning topics of mask mandates and vaccine distribution. However, we went beyond these studies to characterize the geographic heterogeneity and time course of polarization, relating features in our polarization metric to real world events. We looked into the relationship between polarization and public health initiatives in the U.S. and confirmed a strong negative correlation between partisan polarization and future vaccination rates and a moderate negative correlation between the temporal profiles of volume of conspiracy-related tweets and aggregate polarization. We did not observe similar patterns in Canada.

4.1 United States

The early phases of the COVID-19 pandemic in the United States prompted a variety of public discussions online that reflected strong regional variation of partisan support [25, 47, 48, 49, 50]. State-specific polarization obtained from our computational approach could be expected to be uniformly low, with the message content of conservative and liberal states each having internally homogeneous semantics. Instead, our analysis confirmed that polarization was notably higher in conservative states, where we found that Republican vote margins had a significant positive effect on polarization in discussions concerning masks and vaccines, after controlling for other factors.

This main result is consistent with previous studies that suggest conservative states exhibited higher levels of polarization in response to public health interventions compared to their liberal counterparts [10, 48]. Nevertheless, we found that this pattern does not hold across every state. Delaware, a liberal state, exhibited a distinctly high level of polarization, likely due to the strict public health measures implemented by the governor in response to the rapid increase in the number of cases during the first wave of the pandemic [51, 52, 53]. Similarly, low levels of polarization were observed in several conservative states, like Arkansas, Alabama, and Idaho, with a possible explanation coming from more unified opposition to restrictive COVID-19 measures like mask mandate orders [54]. Nevertheless, most conservative states exhibited relatively high levels of polarization (fig. 6(a)).

Our analysis did not identify a single, general cause for this relationship. That said, commonalities in each state’s trajectory in the pandemic offer some clues: e.g., Mississippi, North Dakota, and Oklahoma experienced specific political decisions—such as mask mandates—made in the earlier phase of the pandemic, that led to resistance despite rising COVID-19 cases [55, 56, 47]. In Mississippi, the decision by the governor to lift the statewide mask mandate in late September could also have contributed to heightened levels of polarization [55, 54]. A similar pattern was observed in North Dakota, South Dakota, and Oklahoma, where initial hesitancy to enforce mask mandates also appears to have led to increased partisan divisions [57, 58, 59, 60, 61, 62, 54]. This finding is not surprising, since the party affiliation of governors is the most important predictor of the widespread adoption of mask mandates [63]. One possible explanation for our main result that can be gleaned from these anecdotes is that pandemic severity increasingly strains the more uniform opposition to restrictive health measures in more conservative states, leading them to exhibit higher levels of polarization [64, 65, 54]. This is a distinct source of polarization than that in states with more equal distribution of partisans across competitive districts.

Through our approach, we could also dissect how polarization varies in time over the three topics we considered: lockdowns, masks, and vaccines. These three topics exhibited similar baseline levels of polarization during the period of study, which was between the second and third waves of the pandemic, punctuated by large positive deviations that typically rise and fall quickly. The prevalence of these deviations was smallest for the lockdown topic. Its low correlation with Republican party vote share suggests that it did not act as a meaningful indicator of partisan opposition. The polarization time course for masks and vaccines, however, contained many, sharp peaks, many of which we were able to identify with a real world event. For example, South Dakota initially experienced very high levels of mask polarization, coinciding with efforts by medical authorities to promote mask-wearing, despite Governor Kristi Noem’s opposition [54, 66]. Her public display of opposition to Biden’s suggestion of mask mandates lead to one such peak in polarization. Similarly, North Dakota also displayed early signs of increased polarization on conspiracy theories [47], which may have been exacerbated by the posthumous electoral win of a Republican candidate who died from COVID-19 [67, 68, 69].

The strong negative correlation between vaccines polarization and vaccination rates that we observed in the U.S. demonstrates that states with higher vaccination rates were also less polarized around this issue [70]. The exact origin of this correlation is unclear. However, factors such as education and political ideology, which also have a strong geographic dependence, likely played a role [71]. Indeed, higher education levels are generally associated with greater vaccine acceptance and trust in vaccine safety [72]. Moreover, Democrats tend to trust the COVID-19 vaccines more and have been early adopters, whereas Republicans generally show lower levels of such trust [73].

Overall, the high levels of polarization observed in the United States relative to Canada point to a more divided society. Several studies have confirmed that conservative states and counties were less likely to adopt social distancing measures, impose mask mandates, and get vaccinated in the second and third wave of the pandemic. Our study offers new insights into these trends by demonstrating that they correlate with regional heterogeneity in social media discourse, particularly during salient political events around health measures. We also found that this discourse reflects changes in the pandemic timeline, initially related to stay-at-home lockdown orders, followed by mask mandates, and later transitioning to vaccines as they first became available.

4.2 Canada

The Canadian set of results also suggest that partisan divisions influenced public responses to COVID-19 measures in this country [74]. Compared to the U.S., we found a similar, albeit much weaker association between polarization and conservative political leaning, with conservative provinces like Alberta and Saskatchewan experiencing higher levels of polarization during stricter lockdown measures than their more liberal counterparts, such as British Columbia and Ontario [75, 35]. Polarization levels varied over smaller and medium-sized provinces as well, measured as relatively high for New Brunswick and low for Nunavut, where the rates of COVID-19 infections remained relatively low during the pandemic (the sole COVID-free jurisdiction in North America until November 2020) [76, 77]. Additionally, we also found that polarization surrounding mask mandates and vaccines were not homogeneously distributed across provinces.

Among the Canadian provinces, Quebec is an interesting case for our analysis of polarization. For example, our results confirmed that Quebec had the highest level of partisan division over vaccines, but also the highest reported incidence of COVID-19 in Canada during the first and second waves of the pandemic [78]. Quebec’s unique approach to managing the pandemic with its more restrictive measures relative to other provinces is also somewhat reflected in our results. After a relative hiatus with several restrictions relaxed in the summer of 2020, Quebec once again became the epicenter of the pandemic in the fall [79]. This resurgence led to the reinstatement of strict pandemic control measures and a ban on public demonstrations following significant anti-government protests against lockdowns and mask mandates [80, 81]. These events also coincided with an increase in online conflicts, promoted by Canadian far-right populist rhetoric and conspiracy theories on Twitter (X) [82]. It is important to note, however, that most of these conversations in Canada were heavily influenced by discussions in the US, with Canadians retweeting American vaccine-related content 8 times as often as Canadian content during the period covered by our study [83, 84]. Likewise, vaccine hesitancy was also linked to political affiliation in Canada, with those supporting the Conservative Party more likely to refuse vaccination [85].

As in the U.S. case, the polarization time course computed for Canada also exhibits spikes observed around key events like protests against lockdown measures, mask mandates, and vaccine roll-outs. These findings suggest that public reactions to significant political and social events during the pandemic are reflected in the measure of polarization we use. We did not observe the negative correlation between polarization and volume of conspiracy-related tweets that we saw in the U.S. case. This contrasts with [83], who found a reduction in negative sentiment in Canadian vaccine-related tweets between January and December 2020. The relationship between polarization and sentiment is complex and long-term trends are likely driven by processes besides pure volume of discussion around conspiracy theories [86].

Finally, the relatively lower influence of polarization on vaccine attitudes may be attributed to the country’s more widespread vaccine mandates [87, 88]. This prevalence, along with higher levels of trust in politicians [89] and social capital [90, 91], could have contributed to a broader acceptance of COVID-19 health interventions [84, 85]. Indeed, there was a rare ‘cross-partisan consensus’ among Canadians regarding emergency measures in the early stages of the pandemic [92]. This consensus, however, was not mirrored on social media, where conspiracy theories widely circulated [24, 84]. Overall, our results indicate that online discussions surrounding lockdowns, masks, and vaccines did mirror polarization, and were shaped by regional reactions to events and circumstances specific to Canadian provinces.

4.3 Limitations

While our method offers valuable insights, it comes with certain limitations. First, we viewed partisan polarization only through the proxy of semantic similarity. This choice may in certain cases obscure some signals not captured by the semantic embedding representation. Second, specifically in the Canadian context, we categorized users into liberal (left) and conservative (right) party family groups. During the manual annotation of Twitter (X) profiles, we encountered few users who identified as supporters of the Bloc Quebecois political party; therefore, we opted to exclude them from the analysis. Additionally, our classification of users into liberal and conservative partisan groups is based on self-reported information, which may not be entirely accurate. Third, it is important to note that our analysis is based on Twitter (X) data, which may not fully capture the views and sentiments of the broader American and Canadian public. Fourth, our analysis is restricted to tweets in English. In the context of Canada, this means we are capturing only or primarily the perspectives of either anglophones or bilingual francophones, which could potentially bias our data; for example, the high levels of polarization observed in Quebec on COVID-19 measures may be influenced by this language bias. Finally, while several of our analyses rely on correlations, it is crucial to remember that these results do not imply causation; the relationship between polarization and public health measures is complex and multi-dimensional.

4.4 Conclusion

To conclude, our method has provided valuable insights into the dynamics of partisan polarization during the COVID-19 pandemic. Political ideology, public trust, and key events have emerged as important factors influencing public discussions on pandemic-related issues in the United States and Canada. By combining our polarization measure with other data, researchers and practitioners can better understand how polarization varies across location, time, and specific issues. This knowledge could help in detecting particularly polarizing discussions on social media and in developing communication strategies to mitigate the spread of misinformation, both for the current pandemic and for future health-related crises.

The differences observed between these two countries are somewhat harder to explain. Our analysis, along with insights from recent studies, suggests that Canadian responses to public health measures could explain the lower levels of polarization found in Canada. Indeed, there was a significant consensus on the effectiveness of stay-at-home orders (i.e., lockdowns), mask mandates, and vaccines not only at the federal and provincial levels, but also within the news media. And unlike the U.S., where an important number of Republican leaders aligned with Trump’s anti-mask and anti-lockdown positions, the pandemic did not become a salient partisan issue within a political campaign until much later in 2021. Prior to this, the opposition to public health measures in Canada was primarily found in online communities, outside of the mainstream media and political parties, where protesters remained heavily influenced by American sources. Although our results suggest that social networks contributed to the diffusion of these opinions during the COVID-19 pandemic, more work needs to be done to quantify the impact of online communities interactions on polarization.

5 Methodology

In the following section, we describe in detail our text-based measurement of partisan polarization. We first explain the data collection process. We then show how we classified tweets into respective topics, geo-located users and grouped them by party affiliation. Finally, we describe the equation used to measure partisan polarization as well as our approximation algorithm. Figure 1 provides a visual overview of our process in measuring partisan polarization. For additional details, please refer to Section 8 in the Supplementary Material.

5.1 Data Collection

5.1.1 Twitter (X) Data

We used Twitter’s (X) official API to collect 1% of real-time tweets for Canada and the United States from October \nth9, 2020 to January \nth4, 2021. This represents 231,841,790 tweets and 4,765,115 users for Canada (a dataset filtered for COVID and politics) and 387,090,097 tweets and 23,758,112 users for the United States (a dataset filtered for election politics). We fed the following list of keywords in the API to filter relevant tweets:

Canada: ‘trudeau’, ‘legault’, ‘doug ford’, ‘pallister’, ‘horgan’, ‘scott moe’, ‘jason kenney’, ‘dwight ball’, ‘blaine higgs’, ‘stephan mcneil’, ‘cdnpoli’, ‘canpol’, ‘cdnmedia’, ‘mcga’, ‘covidcanada’ and all combinations of ‘covid’ or ‘coronavirus’ as the prefix and the (full & abbreviated) name of each provinces and territories as the suffix.

United States: ‘JoeBiden’, ‘DonaldTrump’, ‘Biden’, ‘Trump’, ‘vote’, ‘election’, ‘2020Elections’, ‘Elections2020’, ‘PresidentElectJoe’, ‘MAGA’, ‘BidenHaris2020’, ‘Election2020’.

5.2 COVID-19 Vaccination Rate

Similar to the COVID-19 pandemic data, we also used the officially reported vaccination rate of the populations. We used the vaccination rates one year later compared to the Twitter (X) data, as COVID-19 vaccines were created and approved at the very end of our data collection process. Thus, the vaccination rates are for those who obtained at least two doses. For Canada, this is the numtotal_fully𝑛𝑢𝑚𝑡𝑜𝑡𝑎𝑙_𝑓𝑢𝑙𝑙𝑦numtotal\_fullyitalic_n italic_u italic_m italic_t italic_o italic_t italic_a italic_l _ italic_f italic_u italic_l italic_l italic_y from the government’s vaccine coverage map. We normalize this column by Canada’s 2021 population per province or territory. For the United States, we use the people_fully_vaccinated_per_hundred𝑝𝑒𝑜𝑝𝑙𝑒_𝑓𝑢𝑙𝑙𝑦_𝑣𝑎𝑐𝑐𝑖𝑛𝑎𝑡𝑒𝑑_𝑝𝑒𝑟_𝑢𝑛𝑑𝑟𝑒𝑑people\_fully\_vaccinated\_per\_hundreditalic_p italic_e italic_o italic_p italic_l italic_e _ italic_f italic_u italic_l italic_l italic_y _ italic_v italic_a italic_c italic_c italic_i italic_n italic_a italic_t italic_e italic_d _ italic_p italic_e italic_r _ italic_h italic_u italic_n italic_d italic_r italic_e italic_d reported in the COVID Data Tracker from the CDC.

5.3 Classifying Tweets By Topics

For this study, we looked into three key topics for COVID-19: lockdowns, masks and vaccines. We also looked into conspiracy theories. For each topic, tweets were classified as relevant or irrelevant to the topic based on whether they contained at least one of the topic-specific keywords. For conspiracy-related tweets, relevant means that the content is related to COVID-19 conspiracy theories (either supporting or opposing). A tweet can belong to more than one topic.

We first used a hashtag-based filtering step. We extracted all hashtags within our dataset, ordered it by frequency, and discarded those that appeared less than 100 times. This filtered list contained 3,600 hashtags for Canada and 18,000 for the United States. Two political scientists manually annotated this list with topic and relevance labels. The list was narrowed to only those hashtags labeled as relevant, resulting in 631 relevant hashtags. We then merged these with hashtags identified in previous studies for the same topics—i.e., from refs. [93, 94, 95].

For Canada, this process resulted in 46,636,206 tweets and 1,757,675 users that shared content related to COVID-19. For the United States, this represents 12,552,213 tweets and 2,657,355 users. Using the RoBERTa-base model [31] from HuggingFace, we further pre-trained this model on the respective COVID-19 tweets from each country dataset—i.e., performing a self-supervised learning on predicting masked words within tweets. This results in two different country-specific pre-trained language models for COVID-19 tweets.

We then randomly sampled 200 relevant and 200 irrelevant tweets per topic from each dataset, for a total of 1,600 tweets. The same two political scientists manually reviewed each tweet separately to determine if the tweet was relevant/irrelevant. We discarded tweets where the annotators could not reach a consensus. We then trained the respective pre-trained RoBERTa-base model on each dataset to classify by topics—i.e., 4 topic language models per country, for a total of 8 language models. We report the support, Cohen Kappa, F1-score and number of tweets we extracted for each topic within each dataset in Table 4. Our analysis achieved a near perfect F1-score for each of these topics.

5.4 Classifying Users by Geo-Location

We wanted to quantify users in each province or state represented in our data, as the users retrieved from Twitter (X) could be imbalanced relative to region population size. For this, we geolocated all users with an explicit location provided in the location field, a free-form text, as part of their profile information. We process the information with Open Street Map and the ArcGIS API. Both of these return a latitude and longitude if a location was found. We set the threshold for a clear geolocation if both API responded with a latitude and longitude that was within one degree of each other. We found this to be more accurate, and preferable to using a pre-trained Named Entity Recognition algorithm; most of the user-provided locations can be handled by these Geographic Information System APIs, and the APIs could also provide important details such as the country, state and city. We correlated the geolocated users with the official population census for each country’s region. In total, we geolocated 282,454 users with strong correlation of 0.92 (n=13, p=6.20e-06, CI=[0.76, 0.98]) for Canada and 757,601 users with strong correlation of 0.98 (n=52, p=9.27e-35, CI=[0.96, 0.99]) for the United States. This means that each region is well represented in our data. Further information is reported in Table 5.

5.5 Classifying Users By Party Affiliation

We determine a user’s party affiliation using a two step approach. First, we classify politically explicit users based on their profiles. We then use the predictions from this profile classifier as labels to train a classifier based on the user activity. We report the support, F1-score, and number of users we classified for each party within each dataset in Table 6 for Canada and Table 9 for the United States. We achieve a macro-F1-score of 91% for both Canada and the United States.

Profile Classifier

As a preprocessing step, we filter out users that are not politically explicit. Politically explicit users are those whose profile description contains at least one political keyword defined for any political party. For Canada, we focused on the five main political parties: Conservative, Green, Liberal, New Democratic Party and People’s Party. For the United States, we focused on the Democratic and Republican parties. The following is the set of keywords we have per party:

Canada:

Conservative - ‘erin o’toole’, ‘andrew scheer’, ‘conservative’, ‘conservative party’, ‘cpc’, ‘cpc2021’, ‘cpc2019’, ‘conservative party of canada’

Green - ‘annamie paul’, ‘green party’, ‘gpc’, ‘gpc2019’, ‘gpc2021’, ‘green party of canada’

Liberal - ‘justin trudeau’, ‘liberal’, ‘liberal party’, ‘lpc’, ‘lpc2021’, ‘lpc2021’, ‘lpc2019’, ‘liberal party of canada’

New Democratic Party - ‘jagmeeet singh’, ‘new democrat’, ‘new democrats’, ‘new democratic party’, ‘ndp’, ‘ndp2021’, ‘ndp2019’

People’s Party - ‘maxime bernier’, ‘people’s party’, ‘ppc’, ‘ppc2019’, ‘ppc2021’, ‘people’s party of canada’

United States:

Democrat - ‘liberal,’ ‘progressive,’ ‘democrat,’ ‘biden’

Republican - ‘conservative,’ ‘gop,’ ‘republican,’ ‘trump,’

We then randomly selected a set of politically explicit users for each party to be manually annotated by two political scientists. We only use party labels that both annotators agreed upon. The Cohen Kappa score for the pair of annotation sets is 0.74 and 0.76 for Canada and the United States respectively. We then train a RoBERTA-large model with a 80-20 train-test split to determine user party affiliation (see the profile classifier section for more details). Exact numbers can be found in their respective tables in the supplementary.

Activity Classifier

For this second phase, we make use of the respective RoBERTa-base model pre-trained on COVID-19 tweets for each dataset to extract the tweet embeddings (768-dimensional vector). We then generate user embeddings (768-dimensional vector) by pooling together (mean aggregation) all tweet embeddings from that user.

We then train an MLP consisting of two fully connected layers with the user embeddings as input to predict the party affiliation. Before training, we filtered out users based on their activity α𝛼\alphaitalic_α (i.e., number of COVID-19-related tweets in the dataset). We performed a hyperparameter search for α𝛼\alphaitalic_α among {1,3,5,10,15,20}135101520\{1,3,5,10,15,20\}{ 1 , 3 , 5 , 10 , 15 , 20 } using 5-fold cross validation. This was 5 tweets for Canada and 10 tweets for the United States.

Specifically for Canada, we found that the MLP could not distinguish the parties sufficiently. Hence, we grouped the parties based on their partisan leaning. The liberal (left) party family included the Liberal Party, New Democratic Party and Green Party while the conservative (right) party family included the Conservative Party and People’s Party. We removed supporters of other minor parties and the Bloc Quebecois.

External Evaluation

Following the best practice for evaluating party affiliation predictions [96], we matched Twitter (X) users from the United States with the primary voter registration records available for five states: Ohio, New York, Florida, Arkansas, North Carolina, as well as Washington DC. We describe this procedure and its result in more details in the supplementary material 10. We achieve an accuracy of 74.35% for the profile classifier and 73.35% for the activity classifier. Both classifiers are binary, but users can also be independent or support a third party, despite an ideology (and behavior/voting) that aligns strongly with one of the two main parties. They can also have an outdated registration that no longer reflects the beliefs they currently hold and express on Twitter (X). Therefore, although accuracy here is lower than in our training model, it still indicates our classification is acceptable and on par with the standard for this type of evaluation [96].

5.6 Measuring Partisan Polarization

Given a set of political parties 𝒫𝒫\mathcal{P}caligraphic_P and a set of given user embeddings 𝒰={u(1),u(2),,u(n)}𝒰superscript𝑢1superscript𝑢2superscript𝑢𝑛\mathcal{U}=\{{u^{(1)},u^{(2)},\dots,u^{(n)}}\}caligraphic_U = { italic_u start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , … , italic_u start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT } where u(i)768superscript𝑢𝑖superscript768u^{(i)}\in\mathbb{R}^{768}italic_u start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 768 end_POSTSUPERSCRIPT and n𝑛nitalic_n is the number of users, we measure polarization as follows.

We first look at the distance and dispersion between each party, p𝒫𝑝𝒫p\in\mathcal{P}italic_p ∈ caligraphic_P. We base our measure on the C-index of Hubert [32] to quantify the extent of clustering and overlap of each political party. This is done by first calculating the sum of inter-cluster distances:

Sw=12p𝒫u,vpuvsubscript𝑆𝑤12subscript𝑝𝒫subscript𝑢𝑣𝑝norm𝑢𝑣S_{w}=\frac{1}{2}\sum_{p\in\mathcal{P}}\sum_{u,v\in p}||u-v||italic_S start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_p ∈ caligraphic_P end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_u , italic_v ∈ italic_p end_POSTSUBSCRIPT | | italic_u - italic_v | | (1)

We then normalize this value based on its minimum and maximum possible ranges, Sminsubscript𝑆𝑚𝑖𝑛S_{min}italic_S start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT and Smaxsubscript𝑆𝑚𝑎𝑥S_{max}italic_S start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT. These correspond to the sum of the m𝑚mitalic_m smallest (resp. largest) distances between points in 𝒰𝒰\mathcal{U}caligraphic_U; where m=p𝒫|p|(|p|1)2𝑚subscript𝑝𝒫𝑝𝑝12m=\sum_{p\in\mathcal{P}}\frac{|p|(|p|-1)}{2}italic_m = ∑ start_POSTSUBSCRIPT italic_p ∈ caligraphic_P end_POSTSUBSCRIPT divide start_ARG | italic_p | ( | italic_p | - 1 ) end_ARG start_ARG 2 end_ARG.

Based on these, we define our polarization index poli𝑝𝑜𝑙𝑖poliitalic_p italic_o italic_l italic_i as:

poli=SmaxSwSmaxSmin𝑝𝑜𝑙𝑖subscript𝑆𝑚𝑎𝑥subscript𝑆𝑤subscript𝑆𝑚𝑎𝑥subscript𝑆𝑚𝑖𝑛poli=\frac{S_{max}-S_{w}}{S_{max}-S_{min}}italic_p italic_o italic_l italic_i = divide start_ARG italic_S start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT - italic_S start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT - italic_S start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT end_ARG (2)

The minimum value 0 represents no polarization, whereas the maximum value 1 represents the most extreme possible polarization, i.e., p𝒫𝑝𝒫p\in\mathcal{P}italic_p ∈ caligraphic_P are completely isolated from each other.

5.7 Approximation of Polarization

Equation 2 is not scalable to large n𝑛nitalic_n, as it is O(n2log(n2))𝑂superscript𝑛2superscript𝑛2O(n^{2}\log(n^{2}))italic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ). We approximate it by sub-sampling a sufficient set of users which enables us to scale to a large number of users. To determine the minimum sample size needed, we use the coefficient of variation, which is defined as stdmean𝑠𝑡𝑑𝑚𝑒𝑎𝑛\frac{std}{mean}divide start_ARG italic_s italic_t italic_d end_ARG start_ARG italic_m italic_e italic_a italic_n end_ARG and expressed as a percentage. Generally, a coefficient of variation under 10 gives reasonable results [97].

Algorithm 1 summarizes this procedure. This approximation allows us to scale our measure significantly without compromising accuracy. One loop of the approximation has a time complexity of O(rf2n2log((n)2))𝑂𝑟superscript𝑓2superscript𝑛2superscript𝑛2O(rf^{2}*n^{2}\log((n)^{2}))italic_O ( italic_r italic_f start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∗ italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( ( italic_n ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) where r𝑟ritalic_r is the repeat count and f𝑓fitalic_f is the fraction (e.g., 0.01). From our testing, we know that at large values of n𝑛nitalic_n, the fraction needed rarely increases, so only one loop is required. Therefore, the time saved is rf2𝑟superscript𝑓2rf^{2}italic_r italic_f start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for large values of n𝑛nitalic_n.

We test the accuracy of approximation in Algorithm 1 over the daily lockdown and vaccine tweets. In Figure 19(a), we plot the total number of users against the absolute error (and its standard deviation) of the approximated poli𝑝𝑜𝑙𝑖poliitalic_p italic_o italic_l italic_i compared to the exact value, binned for every 1,000 users. We observe a dramatic drop of the absolute error term at around 3,000 users. When we reach the 10,000 users value, the absolute error is usually below 0.001. In Figure 19(b), we plot the total number of users against the time saved in running the approximation algorithm compared to running the exact poli𝑝𝑜𝑙𝑖poliitalic_p italic_o italic_l italic_i, binned also for every 1,000 users. We observe that the time saved is exponential to the number of users. We note that at around 50,000 users, the approximation rarely needs to increase the fraction of users sampled.

These findings confirm that we can accurately approximate poli𝑝𝑜𝑙𝑖poliitalic_p italic_o italic_l italic_i for large-scale data that is impossible to measure exactly because of memory constraints. As poli𝑝𝑜𝑙𝑖poliitalic_p italic_o italic_l italic_i relies heavily on finding pairwise distances (time and memory intensive), we see from our analysis that a sampling approach can save both time and memory exponentially.

Algorithm 1 Approximating poli𝑝𝑜𝑙𝑖poliitalic_p italic_o italic_l italic_i
1:𝒰𝒰\mathcal{U}caligraphic_U, fraction𝑓𝑟𝑎𝑐𝑡𝑖𝑜𝑛fractionitalic_f italic_r italic_a italic_c italic_t italic_i italic_o italic_n, epsilon𝑒𝑝𝑠𝑖𝑙𝑜𝑛epsilonitalic_e italic_p italic_s italic_i italic_l italic_o italic_n, step_size𝑠𝑡𝑒𝑝_𝑠𝑖𝑧𝑒step\_sizeitalic_s italic_t italic_e italic_p _ italic_s italic_i italic_z italic_e, repeats𝑟𝑒𝑝𝑒𝑎𝑡𝑠repeatsitalic_r italic_e italic_p italic_e italic_a italic_t italic_s
2:while  cv_poli>epsilon𝑐𝑣_𝑝𝑜𝑙𝑖𝑒𝑝𝑠𝑖𝑙𝑜𝑛cv\_poli>epsilonitalic_c italic_v _ italic_p italic_o italic_l italic_i > italic_e italic_p italic_s italic_i italic_l italic_o italic_n  do
3:     poli_indices[]𝑝𝑜𝑙𝑖_𝑖𝑛𝑑𝑖𝑐𝑒𝑠poli\_indices\leftarrow[]italic_p italic_o italic_l italic_i _ italic_i italic_n italic_d italic_i italic_c italic_e italic_s ← [ ]
4:     num_of_runs0𝑛𝑢𝑚_𝑜𝑓_𝑟𝑢𝑛𝑠0num\_of\_runs\leftarrow 0italic_n italic_u italic_m _ italic_o italic_f _ italic_r italic_u italic_n italic_s ← 0
5:     for i𝑖iitalic_i in repeats𝑟𝑒𝑝𝑒𝑎𝑡𝑠repeatsitalic_r italic_e italic_p italic_e italic_a italic_t italic_s do
6:         𝒰ssample(𝒰,fraction)subscript𝒰𝑠𝑠𝑎𝑚𝑝𝑙𝑒𝒰𝑓𝑟𝑎𝑐𝑡𝑖𝑜𝑛\mathcal{U}_{s}\leftarrow sample(\mathcal{U},fraction)caligraphic_U start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ← italic_s italic_a italic_m italic_p italic_l italic_e ( caligraphic_U , italic_f italic_r italic_a italic_c italic_t italic_i italic_o italic_n )
7:         poli_indices.append(poli(𝒰s))formulae-sequence𝑝𝑜𝑙𝑖_𝑖𝑛𝑑𝑖𝑐𝑒𝑠𝑎𝑝𝑝𝑒𝑛𝑑𝑝𝑜𝑙𝑖subscript𝒰𝑠poli\_indices.append(poli(\mathcal{U}_{s}))italic_p italic_o italic_l italic_i _ italic_i italic_n italic_d italic_i italic_c italic_e italic_s . italic_a italic_p italic_p italic_e italic_n italic_d ( italic_p italic_o italic_l italic_i ( caligraphic_U start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) )
8:     end for
9:     mean_polimean(poli_indices)𝑚𝑒𝑎𝑛_𝑝𝑜𝑙𝑖𝑚𝑒𝑎𝑛𝑝𝑜𝑙𝑖_𝑖𝑛𝑑𝑖𝑐𝑒𝑠mean\_poli\leftarrow mean(poli\_indices)italic_m italic_e italic_a italic_n _ italic_p italic_o italic_l italic_i ← italic_m italic_e italic_a italic_n ( italic_p italic_o italic_l italic_i _ italic_i italic_n italic_d italic_i italic_c italic_e italic_s )
10:     std_polistd(poli_indices)𝑠𝑡𝑑_𝑝𝑜𝑙𝑖𝑠𝑡𝑑𝑝𝑜𝑙𝑖_𝑖𝑛𝑑𝑖𝑐𝑒𝑠std\_poli\leftarrow std(poli\_indices)italic_s italic_t italic_d _ italic_p italic_o italic_l italic_i ← italic_s italic_t italic_d ( italic_p italic_o italic_l italic_i _ italic_i italic_n italic_d italic_i italic_c italic_e italic_s )
11:     cv_polistd_poli/mean_poli𝑐𝑣_𝑝𝑜𝑙𝑖𝑠𝑡𝑑_𝑝𝑜𝑙𝑖𝑚𝑒𝑎𝑛_𝑝𝑜𝑙𝑖cv\_poli\leftarrow std\_poli/mean\_poliitalic_c italic_v _ italic_p italic_o italic_l italic_i ← italic_s italic_t italic_d _ italic_p italic_o italic_l italic_i / italic_m italic_e italic_a italic_n _ italic_p italic_o italic_l italic_i
12:     fractionfraction+step_size𝑓𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑓𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠𝑡𝑒𝑝_𝑠𝑖𝑧𝑒fraction\leftarrow fraction+step\_sizeitalic_f italic_r italic_a italic_c italic_t italic_i italic_o italic_n ← italic_f italic_r italic_a italic_c italic_t italic_i italic_o italic_n + italic_s italic_t italic_e italic_p _ italic_s italic_i italic_z italic_e
13:end while
14:return mean_poli𝑚𝑒𝑎𝑛_𝑝𝑜𝑙𝑖mean\_poliitalic_m italic_e italic_a italic_n _ italic_p italic_o italic_l italic_i

6 Acknowledgements

This research is supported by CIFAR AI Catalyst Grants and Canada CIFAR AI Research Chair funding.

References

  • \bibcommenthead
  • [1] Hart, P. S., Chinn, S. & Soroka, S. Politicization and polarization in covid-19 news coverage. Science Communication 42, 679–697 (2020). URL https://doi.org/10.1177/1075547020950735.
  • [2] Iyengar, S., Lelkes, Y., Levendusky, M., Malhotra, N. & Westwood, S. J. The origins and consequences of affective polarization in the united states. Annual Review of Political Science 22, 129–146 (2019). URL https://doi.org/10.1146/annurev-polisci-051117-073034.
  • [3] Pew Research Center, D., Washington. Political polarization in the american public. https://www.pewresearch.org/politics/2014/06/12/political-polarization-in-the-american-public/ (2014).
  • [4] Amlani, S. & Algara, C. Partisanship & nationalization in american elections: Evidence from presidential, senatorial, & gubernatorial elections in the us counties, 1872–2020. Electoral Studies 73, 102387 (2021).
  • [5] Stewart, A. J., McCarty, N. & Bryson, J. J. Polarization under rising inequality and economic decline. Science advances 6, eabd4201 (2020).
  • [6] Chu, H., Yang, J. Z. & Liu, S. Not my pandemic: Solution aversion and the polarized public perception of covid-19. Science Communication 43, 508–528 (2021).
  • [7] Gadarian, S. K., Goodman, S. W. & Pepinsky, T. B. Partisanship, health behavior, and policy attitudes in the early stages of the covid-19 pandemic. Plos one 16, e0249596 (2021).
  • [8] Wu, J. D. & Huber, G. A. Partisan differences in social distancing may originate in norms and beliefs: Results from novel data. Social Science Quarterly n/a. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/ssqu.12947.
  • [9] Gollwitzer, A. et al. Partisan differences in physical distancing are linked to health outcomes during the covid-19 pandemic. Nature human behaviour 4, 1186–1197 (2020).
  • [10] Allcott, H. et al. Polarization and public health: Partisan differences in social distancing during the coronavirus pandemic. Journal of public economics 191, 104254 (2020).
  • [11] Kahane, L. H. Politicizing the mask: Political, economic and demographic factors affecting mask wearing behavior in the usa. Eastern Economic Journal 1 – 21 (2021).
  • [12] Milosh, M., Painter, M., Sonin, K., Van Dijcke, D. & Wright, A. L. Unmasking partisanship: Polarization undermines public response to collective risk. Journal of Public Economics 204, 104538 (2021).
  • [13] Fridman, A., Gershon, R. & Gneezy, A. Covid-19 and vaccine hesitancy: A longitudinal study. PLOS ONE 16, 1–12 (2021). URL https://doi.org/10.1371/journal.pone.0250123.
  • [14] Druckman, J. N., Klar, S., Krupnikov, Y., Levendusky, M. & Ryan, J. B. Affective polarization, local contexts and public opinion in america. Nature human behaviour 5, 28–38 (2021).
  • [15] Ojea Quintana, I., Reimann, R., Cheong, M., Alfano, M. & Klein, C. Polarization and trust in the evolution of vaccine discourse on twitter during covid-19. PLos One 17, e0277292 (2022).
  • [16] Ward, J. K. et al. The french public’s attitudes to a future covid-19 vaccine: The politicization of a public health issue. Social Science and Medicine 265, 113414 (2020). URL https://www.sciencedirect.com/science/article/pii/S027795362030633X.
  • [17] Jiang, J., Ren, X., Ferrara, E. et al. Social media polarization and echo chambers in the context of covid-19: Case study. JMIRx med 2, e29570 (2021).
  • [18] Medeiros, M. & Gravelle, T. B. Pandemic populism: Explaining support for the people’s party of canada in the 2021 federal election. Canadian Journal of Political Science/Revue canadienne de science politique 56, 413–434 (2023).
  • [19] Pammett, J. H. & Dornan, C. The canadian federal election (2022).
  • [20] Blouin Genest, G., Burlone, N., Champagne, E., Eastin, C. & Ogaranko, C. Translating covid-19 emergency plans into policy: A comparative analysis of three canadian provinces. Policy Design and Practice 4, 115–132 (2021).
  • [21] Adolph, C., Amano, K., Bang-Jensen, B., Fullman, N. & Wilkerson, J. Pandemic politics: Timing state-level social distancing responses to covid-19. Journal of Health Politics, Policy and Law 46, 211–233 (2021).
  • [22] Ashokkumar, A. & Pennebaker, J. W. Social media conversations reveal large psychological shifts caused by covid-19’s onset across us cities. Science advances 7, eabg7843 (2021).
  • [23] Wu, J. D. & Huber, G. A. Partisan differences in social distancing may originate in norms and beliefs: Results from novel data. Social Science Quarterly (2021).
  • [24] Bridgman, A. et al. The causes and consequences of COVID-19 misperceptions: Understanding the role of news and social media. HKS Misinfo Review (2020).
  • [25] Jiang, J., Chen, E., Yan, S., Lerman, K. & Ferrara, E. Political polarization drives online conversations about covid-19 in the united states. Human Behavior and Emerging Technologies 2, 200–211 (2020).
  • [26] Gallotti, R., Valle, F., Castaldo, N., Sacco, P. & De Domenico, M. Assessing the risks of ‘infodemics’ in response to covid-19 epidemics. Nature Human Behaviour 4, 1285–1293 (2020).
  • [27] Lang, J., Erickson, W. W. & Jing-Schmidt, Z. # maskon!# maskoff! digital polarization of mask-wearing in the united states during covid-19. PloS one 16, e0250817 (2021).
  • [28] Rodriguez, C. G., Gadarian, S. K., Goodman, S. W. & Pepinsky, T. B. Morbid polarization: Exposure to covid-19 and partisan disagreement about pandemic response. Political psychology 43, 1169–1189 (2022).
  • [29] Clinton, J., Cohen, J., Lapinski, J. & Trussler, M. Partisan pandemic: How partisanship and public health concerns affect individuals’ social mobility during covid-19. Science advances 7, eabd7204 (2021).
  • [30] Diermeier, D., Godbout, J.-F., Yu, B. & Kaufmann, S. Language and ideology in congress. British Journal of Political Science 42, 31–55 (2012).
  • [31] Liu, Y. et al. Roberta: A robustly optimized bert pretraining approach (2019). 1907.11692.
  • [32] Hubert, L. & Levin, J. A general statistical framework for assessing categorical clustering in free recall. Psychological Bulletin 83, 1072–1080 (1976).
  • [33] Cascini, F. et al. eClinicalMedicine 48 (2022). URL https://doi.org/10.1016/j.eclinm.2022.101454.
  • [34] Wicke, P. & Bolognesi, M. M. Framing covid-19: How we conceptualize and discuss the pandemic on twitter. PLOS ONE 15, 1–24 (2020). URL https://doi.org/10.1371/journal.pone.0240010.
  • [35] Rowe, D. Police hand out hundreds of fines at montreal anti-lockdown demonstration. CTV News (2020). URL https://montreal.ctvnews.ca/police-hand-out-hundreds-of-fines-at-montreal-anti-lockdown-demonstration-1.5239414.
  • [36] Albrecht, D. Vaccination, politics and covid-19 impacts. BMC Public Health 22, 96 (2022). URL https://doi.org/10.1186/s12889-021-12432-x.
  • [37] Leskovec, J., Backstrom, L. & Kleinberg, J. Meme-tracking and the dynamics of the news cycle, 497–506 (2009).
  • [38] Algara, C., Amlani, S., Collitt, S., Hale, I. & Kazemian, S. Nail in the coffin or lifeline? evaluating the electoral impact of covid-19 on president trump in the 2020 election. Political Behavior (2022). URL https://doi.org/10.1007/s11109-022-09826-x.
  • [39] Bryant, M. Donald trump tries to stoke fears of covid lockdown under joe biden. The Guardian (2020). URL https://www.theguardian.com/us-news/2020/nov/02/trump-biden-coronavirus-covid-lockdown.
  • [40] Higgings, T. Joe biden receives covid vaccine on live television, encourages americans to get inoculated. cnbc (2020).
  • [41] Rieger, S. Anti-mask protests show need for better public health messaging, calgary researcher says. cbc (2020).
  • [42] Canada, H. Health canada authorizes moderna covid-19 vaccine. https://www.canada.ca/en/health-canada/news/2020/12/health-canada-authorizes-moderna-covid-19-vaccine.html (2020).
  • [43] White, L. B., Joseph & O’Donnell, C. Moderna covid-19 vaccine begins rollout as u.s. races to broaden injection campaign. https://www.reuters.com/business/healthcare-pharmaceuticals/moderna-covid-19-vaccine-begins-rollout-us-races-broaden-injection-campaign-2020-12-19/ (2020).
  • [44] Jiang, X. et al. Polarization over vaccination: Ideological differences in twitter expression about covid-19 vaccine favorability and specific hesitancy concerns. Social Media+ Society 7, 20563051211048413 (2021).
  • [45] Rathje, S., He, J. K., Roozenbeek, J., Van Bavel, J. J. & van der Linden, S. Social media behavior is associated with vaccine hesitancy. PNAS nexus 1, pgac207 (2022).
  • [46] Bollyky, T. J. et al. Assessing covid-19 pandemic policies and behaviours and their economic and educational trade-offs across us states from jan 1, 2020, to july 31, 2022: an observational analysis. The Lancet 401, 1341–1360 (2023).
  • [47] Rao, A. et al. Political partisanship and anti-science attitudes in online discussions about covid-19. arXiv preprint arXiv:2011.08498 (2020).
  • [48] Morris, D. S. Polarization, partisanship, and pandemic: The relationship between county-level support for donald trump and the spread of covid-19 during the spring and summer of 2020. Social Science Quarterly 102, 2412–2431 (2021).
  • [49] Sehgal, N. J., Yue, D., Pope, E., Wang, R. H. & Roby, D. H. The association between covid-19 mortality and the county-level partisan divide in the united states: study examines the association between covid-19 mortality and county-level political party affiliation. Health Affairs 41, 853–863 (2022).
  • [50] Kaashoek, J. et al. The evolving roles of us political partisanship and social vulnerability in the covid-19 pandemic from february 2020–february 2021. PLOS global public health 2, e0000557 (2022).
  • [51] Governor Carney, J. Governor carney announces additional covid-19 restrictions. Delaware.gov (2020). URL https://news.delaware.gov/2020/11/17/governor-carney-announces-additional-covid-19-restrictions/.
  • [52] Neiburg, J. Covid-19 in delaware: Here are the latest restrictions and what you need to know. delaware online (2020).
  • [53] Goldstein, N. D. & Suder, J. S. Application of state law in the public health emergency response to covid-19: an example from delaware in the united states. Journal of Public Health Policy 42, 167–175 (2021). URL https://doi.org/10.1057/s41271-020-00257-8.
  • [54] Adolph, C. et al. Governor partisanship explains the adoption of statewide mask mandates in response to covid-19. State Politics & Policy Quarterly 22, 24–49 (2022).
  • [55] Wislon, R. Why mississippi’s governor revoked a statewide mask mandate. The Hill (2020). URL https://thehill.com/homenews/state-watch/519658-why-mississippis-governor-revoked-a-statewide-mask-mandate/.
  • [56] Haines, M. Mask-resistant north dakota town battles pandemic spike. VOA (2020). URL https://www.voanews.com/a/covid-19-pandemic_mask-resistant-north-dakota-town-battles-pandemic-spike/6197674.html.
  • [57] Carter, J. & Anthony, W. Reeves: Mississippi ‘not going to participate’ in nationwide lockdown. wlbt (2020). URL https://www.wlbt.com/2020/11/12/watch-live-reeves-press-conference/.
  • [58] Lopez, G. Why north and south dakota are suffering the worst covid-19 epidemics in the us. Vox (2020). URL https://www.vox.com/future-perfect/2020/10/27/21534480/north-dakota-south-dakota-covid-coronavirus-pandemic-third-wave.
  • [59] Allcott, H. et al. Polarization and public health: Partisan differences in social distancing during the coronavirus pandemic. Journal of Public Economics 191, 104254 (2020). URL https://www.sciencedirect.com/science/article/pii/S0047272720301183.
  • [60] of Oklahoma, S. Governor stitt, commissioner frye announce statewide covid-19 mitigation efforts in executive order. Oklahoma, Governor J. Kevin Stitt (2020). URL https://oklahoma.gov/governor/newsroom/newsroom/2020/december/governor-stitt--commissioner-frye-announce-statewide-covid-19-mi.html.
  • [61] Macpherson, J. North dakota governor changes tack and issues mask mandate. AP (2020). URL https://apnews.com/article/bismarck-north-dakota-coronavirus-pandemic-45788d9dfaae23c1db8125088dfa242b.
  • [62] Hallas, L., Hatibie, A., Majumdar, S., Pyarali, M. & Hale, T. Variation in us states’ responses to covid-19. University of Oxford (2021).
  • [63] Mayer, M. K. et al. Politics or need? explaining state protective measures in the coronavirus pandemic. Social Science Quarterly 103, 1140–1154 (2022).
  • [64] Grossman, G., Kim, S., Rexer, J. M. & Thirumurthy, H. Political partisanship influences behavioral responses to governors’ recommendations for covid-19 prevention in the united states. Proceedings of the National Academy of Sciences 117, 24144–24153 (2020).
  • [65] Gusmano, M. K., Miller, E. A., Nadash, P. & Simpson, E. J. Partisanship in initial state responses to the covid-19 pandemic. World Medical & Health Policy 12, 380–389 (2020).
  • [66] AP. South dakota medical groups promote masks, countering noem. Valley News Live (2020). URL https://www.valleynewslive.com/2020/10/28/south-dakota-medical-groups-promote-masks-countering-noem/.
  • [67] Kaurm, H. A north dakota state legislature candidate who died from covid-19 appears to have won his election. CNN (2020). URL https://www.cnn.com/2020/11/04/politics/north-dakota-candidate-died-covid-wins-election-trnd/index.html.
  • [68] Higgins, T. North dakota republican who died of covid-19 wins seat in state legislature. CNBC (2020). URL https://www.cnbc.com/2020/11/04/north-dakota-man-who-died-of-covid-19-wins-seat-in-state-legislature.html.
  • [69] Douglas, K. M. et al. Understanding conspiracy theories. Political psychology 40, 3–35 (2019).
  • [70] Ye, X. Exploring the relationship between political partisanship and covid-19 vaccination rate. Journal of Public Health 45, 91–98 (2023).
  • [71] Bollyky, T. J. et al. Assessing covid-19 pandemic policies and behaviours and their economic and educational trade-offs across us states from jan 1, 2020, to july 31, 2022: an observational analysis. The Lancet 401, 1341–1360 (2023). URL https://doi.org/10.1016/S0140-6736(23)00461-0.
  • [72] Miller, J. Education is a bigger factor than race in desire for covid-19 vaccine (2021). URL https://news.usc.edu/182848/education-covid-19-vaccine-safety-risks-usc-study/.
  • [73] Latkin, C. A., Dayton, L., Yi, G., Konstantopoulos, A. & Boodram, B. Trust in a covid-19 vaccine in the us: A social-ecological perspective. Social science & medicine 270, 113684 (2021).
  • [74] Pennycook, G., McPhetres, J., Bago, B. & Rand, D. G. Beliefs about covid-19 in canada, the united kingdom, and the united states: A novel test of political polarization and motivated reasoning. Personality and Social Psychology Bulletin 48, 750–765 (2022).
  • [75] Cheung, C., Lyons, J., Madsen, B., Miller, S. & Sheikh, S. The bank of canada covid-19 stringency index: measuring policy response across provinces. Tech. Rep., Bank of Canada (2021).
  • [76] Cameron-Blake, E. et al. Variation in the canadian provincial and territorial responses to covid-19. Blavatnik School of Government Working Paper Series 39 (2021).
  • [77] Akanteva, A., Dick, D. W., Amiraslani, S. & Heffernan, J. M. Canadian covid-19 pandemic public health mitigation measures at the province level. Scientific Data 10, 882 (2023). URL https://doi.org/10.1038/s41597-023-02759-y.
  • [78] Dubé, È., Dionne, M., Pelletier, C., Hamel, D. & Gadio, S. Covid-19 vaccination attitudes and intention among quebecers during the first and second waves of the pandemic: findings from repeated cross-sectional surveys. Human Vaccines & Immunotherapeutics 17, 3922–3932 (2021).
  • [79] Shim, E. Regional variability in covid-19 case fatality rate in canada, february–december 2020. International Journal of Environmental Research and Public Health 18, 1839 (2021).
  • [80] Montpetit, J. Quebec sets new record for covid-19 cases amid pockets of resistance to safety measures. https://www.cbc.ca/news/canada/montreal/anti-mask-demonstration-quebec-covid-19-cases-1.5849499 (2020). Accessed: 2024-03-01.
  • [81] Gazette, M. Thousands of montrealers march to protest against wearing masks. https://montrealgazette.com/news/thousands-of-montrealers-march-to-protest-against-wearing-masks (2020). Accessed: 2024-03-01.
  • [82] Chaput, M. Figures de l’identité anti-masque et rhétorique de l’organisationnalité. Communication & Organisation 107–120 (2021).
  • [83] Owen, T. et al. Understanding vaccine hesitancy in canada: attitudes, beliefs, and the information ecosystem (2020).
  • [84] Boucher, J.-C. et al. Analyzing social media to explore the attitudes and behaviors following the announcement of successful covid-19 vaccine trials: infodemiology study. JMIR infodemiology 1, e28800 (2021).
  • [85] Burns, K. E., Dubé, È., Nascimento, H. G. & Meyer, S. B. Examining vaccine hesitancy among a diverse sample of canadian adults. Vaccine 42, 129–135 (2024).
  • [86] Van Bavel, J. J., Rathje, S., Harris, E., Robertson, C. & Sternisko, A. How social media shapes polarization. Trends in Cognitive Sciences 25, 913–916 (2021).
  • [87] Karaivanov, A., Kim, D., Lu, S. E. & Shigeoka, H. Covid-19 vaccination mandates and vaccine uptake. Nature Human Behaviour 6, 1615–1624 (2022).
  • [88] Cameron-Blake, E. et al. Variation in the canadian provincial and territorial responses to covid19”. Blavatnik School of Government Working Paper URL www.bsg.ox.ac.uk/covidtracker. Available:.
  • [89] Mansoor, M. Citizens’ trust in government as a function of good governance and government agency’s provision of quality information on social media during covid-19. Government information quarterly 38, 101597 (2021).
  • [90] Hetherington, M. J. & Rudolph, T. J. Political trust and polarization (2017).
  • [91] Makridis, C. A. & Wu, C. How social capital helps communities weather the covid-19 pandemic. PloS one 16, e0245135 (2021).
  • [92] Merkley, E. et al. A rare moment of cross-partisan consensus: Elite and public response to the COVID-19 pandemic in canada. Can. J. Polit. Sci. 53, 311–318 (2020).
  • [93] Kouzy, R. et al. Coronavirus goes viral: Quantifying the covid-19 misinformation epidemic on twitter. Cureus 12, e7255 (2020). URL https://europepmc.org/articles/PMC7152572.
  • [94] Al-Ramahi, M., Elnoshokaty, A., El-Gayar, O., Nasralah, T. & Wahbeh, A. Public discourse against masks in the covid-19 era: Infodemiology study of twitter data. JMIR Public Health Surveill 7, e26780 (2021). URL https://doi.org/10.2196/26780.
  • [95] Ahmed, W., López Seguí, F., Vidal-Alaball, J. & Katz, M. S. Covid-19 and the “film your hospital” conspiracy theory: Social network analysis of twitter data. J Med Internet Res 22, e22374 (2020). URL http://www.jmir.org/2020/10/e22374/.
  • [96] Barberá, P. Birds of the same feather tweet together: Bayesian ideal point estimation using twitter data. Political analysis 23, 76–91 (2015).
  • [97] Reed, G. F., Lynn, F. & Meade, B. D. Use of coefficient of variation in assessing variability of quantitative assays. Clinical and Vaccine Immunology 9, 1235–1239 (2002).

7 Extended Results

7.1 Regional Variations of Partisan Polarization

Refer to caption
Figure 10: Ranking of American states by partisan polarization per topic. Ranking of 1 signifies the highest average weekly polarization between October \nth11, 2020 to January \nth3, 2021 (12 weeks). State names are colored based on the party ratio from the 2020 United States Presidential Election, where more blue means more users voted for the Democratic Party and more red means more users voted the Republican Party. We can see that red states are mostly ranked higher than blue states.
Refer to caption
Figure 11: Partisan polarization ranking of Canadian provinces and territories per topic. A ranking of 1 signifies the highest average weekly polarization between October \nth11, 2020 to January \nth3, 2021 (12 weeks). Province or territory names are colored based on the party ratio from Canada’s 2021 Federal Election, where more blue means more users from the liberal (left) party family (Liberal, New Democratic Party, Green), and more red means more users from the conservative (right) party family (Conservative, People’s Party).

In fig. 10 and fig. 11, we show the ranking of the regions (American states and Canadian provinces and territories, respectively) by partisan polarization per topic.

Refer to caption
Figure 12: Correlation between Polarization Score and Vote Margin for Conservative Party. Colors are vote margin. Respective Pearson r𝑟ritalic_r correlation and corresponding p𝑝pitalic_p-value is shown in each panel. Significant correlation between Polarization Score and Vote Margin is found for the US discourse on Masks and on Vaccines.

In fig. 12, we show the correlation in the polarization scores and relative vote margins. The latter shows that states in which the margin by which Republican party votes exceeded those of the Democractic party correlates significantly with the amount of polarization exhibited by the Twitter (X) discourse in those states, when conditioning the discourse on masks and on vaccines, but though significantly when conditioning on lockdowns.

7.2 The Relationship between Regional Vaccine Polarization and Vaccination Rates in Canada

In Figure 13, we remove Nunavut as an outlier because of its very small population. While we get strong positive correlation with vaccination rate, it is over a relatively low number of points. In Canada, vaccines were mandated, requiring vaccine passports to be served in public areas. We assume that vaccine partisan polarization increases, as people are not happy with being forced to be vaccinated, but most of the population still are vaccinated. However, with the few points, we do not have a definite conclusion for this result.

Refer to caption
Figure 13: Relation between vaccine polarization and vaccination rates in Canada. The correlation is 0.74 with CI = [0.31, 0.92] (n = 12, p = 0.004). The Vaccine Partisan Polarization for each province or territory is computed weekly and averaged over 12 weeks from October \nth11, 2020 to January \nth3, 2021. Official vaccination rates for different regions are obtained from Statistic Canada. The Vaccination Rate is also averaged weekly for the similar period of time a year into future to be after the vaccines were rolled out, i.e. October \nth11, 2021 to January \nth3, 2022. Color for the scatter plots is determined by the respective party ratio from the 2021 Canadian federal election.

7.3 Specific Regional Partisan Polarization and COVID-19 Deaths

Here, we investigate the topic-specific polarization over time and how it relates to the reported number of Deaths for COVID-19 for specific regions in Figure 14 for Canada and in Figure 15 for the United States.

Refer to caption
(a) Weekly Partisan Polarization & Death Rate in Alberta
Refer to caption
(b) Weekly Partisan Polarization & Death Rate in British Columbia
Refer to caption
(c) Weekly Partisan Polarization & Death Rate in Ontario
Refer to caption
(d) Weekly Partisan Polarization & Death Rate in Quebec
Figure 14: Weekly trends of partisan polarization in Canada for the top 4 largest provinces from October \nth11, 2020 to January \nth3, 2021. We report the average death rate (red dotted line) per week and report the correlation between the topic-specific correlation with the death rate in the brackets in the legend.
Refer to caption
(a) Weekly Partisan Polarization & Death Rate in Mississippi
Refer to caption
(b) Weekly Partisan Polarization & Death Rate in Vermont
Refer to caption
(c) Weekly Partisan Polarization & Death Rate in Delaware
Refer to caption
(d) Weekly Partisan Polarization & Death Rate in Iowa
Figure 15: Weekly trends of partisan polarization in the United States for highest ranked state overall (a), lowest ranked state (b), highest ranked liberal state (c) and lowest ranked conservative state (d) from October \nth11, 2020 to January \nth3, 2021. We report the average death rate (red dotted line) per week and report the correlation between the topic-specific correlation with the death rate in the brackets in the legend.

7.4 Correlation Matrices

Table 2: Correlation Matrix between Topic Polarization and External Data in Canada. Bolded means p<0.001𝑝0.001p<0.001italic_p < 0.001. Italicized means p<0.01𝑝0.01p<0.01italic_p < 0.01. Underline means p<0.05𝑝0.05p<0.05italic_p < 0.05. Background color of green or red signifies the positive or negative correlation for significant p-values only.
Cases Deaths Conspiracy (Volume) Stringency Index
Polarization
-0.338
CI=[-0.511,-0.138]
p=0.001
-0.280
CI=[-0.462,-0.075]
p=0.008
0.190
CI=[-0.020,0.384]
p=0.076
-0.518
CI=[-0.657,-0.347]
p=0.000
Volume
0.244
CI=[0.037,0.432]
p=0.022
0.171
CI=[-0.040,0.367]
p=0.112
-0.084
CI=[-0.288,0.128]
p=0.437
0.259
CI=[0.052,0.444]
p=0.015
% Volume
-0.368
CI=[-0.536,-0.172]
p=0.000
-0.340
CI=[-0.513,-0.141]
p=0.001
0.149
CI=[-0.062,0.348]
p=0.165
-0.391
CI=[-0.555,-0.198]
p=0.000
Lockdown
Weighted
Polarization
-0.402
CI=[-0.564,-0.210]
p=0.000
-0.368
CI=[-0.536,-0.172]
p=0.000
0.172
CI=[-0.039,0.368]
p=0.110
-0.443
CI=[-0.597,-0.258]
p=0.000
Polarization
0.117
CI=[-0.095,0.318]
p=0.279
0.102
CI=[-0.110,0.305]
p=0.345
-0.132
CI=[-0.332,0.080]
p=0.221
0.011
CI=[-0.198,0.220]
p=0.916
Volume
-0.202
CI=[-0.395,0.008]
p=0.059
-0.306
CI=[-0.485,-0.104]
p=0.004
0.462
CI=[0.280,0.612]
p=0.000
-0.267
CI=[-0.452,-0.061]
p=0.012
% Volume
-0.688
CI=[-0.784,-0.559]
p=0.000
-0.689
CI=[-0.785,-0.561]
p=0.000
0.615
CI=[0.465,0.730]
p=0.000
-0.804
CI=[-0.867,-0.715]
p=0.000
Mask
Weighted
Polarization
-0.687
CI=[-0.783,-0.557]
p=0.000
-0.688
CI=[-0.784,-0.559]
p=0.000
0.610
CI=[0.459,0.726]
p=0.000
-0.804
CI=[-0.867,-0.715]
p=0.000
Polarization
0.271
CI=[0.066,0.455]
p=0.011
0.361
CI=[0.164,0.530]
p=0.001
-0.202
CI=[-0.395,0.007]
p=0.059
0.254
CI=[0.047,0.440]
p=0.017
Volume
0.718
CI=[0.599,0.806]
p=0.000
0.658
CI=[0.520,0.762]
p=0.000
-0.364
CI=[-0.533,-0.168]
p=0.000
0.711
CI=[0.590,0.801]
p=0.000
% Volume
0.683
CI=[0.553,0.781]
p=0.000
0.672
CI=[0.538,0.773]
p=0.000
-0.532
CI=[-0.667,-0.363]
p=0.000
0.781
CI=[0.684,0.851]
p=0.000
Vaccine
Weighted
Polarization
0.687
CI=[0.558,0.784]
p=0.000
0.678
CI=[0.546,0.777]
p=0.000
-0.533
CI=[-0.668,-0.364]
p=0.000
0.782
CI=[0.684,0.852]
p=0.000
Sum
-0.121
CI=[-0.322,0.091]
p=0.261
-0.044
CI=[-0.251,0.167]
p=0.681
0.023
CI=[-0.187,0.231]
p=0.831
-0.315
CI=[-0.492,-0.113]
p=0.003
Aggregated
Weighted
Sum
-0.062
CI=[-0.268,0.149]
p=0.566
0.027
CI=[-0.183,0.236]
p=0.799
-0.032
CI=[-0.240,0.178]
p=0.764
-0.256
CI=[-0.442,-0.049]
p=0.016
Table 3: Correlation Matrix between Topic Polarization and External Data in the United States. Bolded means p<0.001𝑝0.001p<0.001italic_p < 0.001. Italacized means p<0.01𝑝0.01p<0.01italic_p < 0.01. Underline means p<0.05𝑝0.05p<0.05italic_p < 0.05. Background color of green or red signifies the positive or negative correlation for significant p-values only.
Cases Deaths Conspiracy (Volume) Stringency Index
Polarization
-0.020
CI=[-0.244,0.207]
p=0.867
-0.070
CI=[-0.291,0.158]
p=0.548
-0.146
CI=[-0.360,0.082]
p=0.207
0.010
CI=[-0.216,0.235]
p=0.931
Volume
-0.196
CI=[-0.403,0.031]
p=0.090
-0.319
CI=[-0.508,-0.100]
p=0.005
0.497
CI=[0.306,0.650]
p=0.000
-0.262
CI=[-0.460,-0.038]
p=0.022
% Volume
-0.394
CI=[-0.569,-0.185]
p=0.000
-0.436
CI=[-0.602,-0.233]
p=0.000
0.120
CI=[-0.109,0.336]
p=0.303
-0.356
CI=[-0.538,-0.142]
p=0.002
Lockdown
Weighted
Polarization
-0.395
CI=[-0.570,-0.186]
p=0.000
-0.442
CI=[-0.607,-0.240]
p=0.000
0.101
CI=[-0.128,0.319]
p=0.387
-0.354
CI=[-0.537,-0.140]
p=0.002
Polarization
-0.103
CI=[-0.321,0.125]
p=0.376
-0.108
CI=[-0.325,0.121]
p=0.355
-0.078
CI=[-0.298,0.150]
p=0.502
-0.121
CI=[-0.337,0.107]
p=0.298
Volume
-0.311
CI=[-0.501,-0.092]
p=0.006
-0.378
CI=[-0.556,-0.167]
p=0.001
0.500
CI=[0.310,0.652]
p=0.000
-0.288
CI=[-0.482,-0.067]
p=0.012
% Volume
-0.541
CI=[-0.683,-0.360]
p=0.000
-0.526
CI=[-0.672,-0.341]
p=0.000
-0.012
CI=[-0.237,0.214]
p=0.915
-0.452
CI=[-0.615,-0.252]
p=0.000
Mask
Weighted
Polarization
-0.565
CI=[-0.701,-0.390]
p=0.000
-0.547
CI=[-0.688,-0.367]
p=0.000
-0.020
CI=[-0.245,0.206]
p=0.861
-0.474
CI=[-0.632,-0.279]
p=0.000
Polarization
-0.246
CI=[-0.446,-0.021]
p=0.033
-0.216
CI=[-0.421,0.010]
p=0.061
-0.268
CI=[-0.466,-0.046]
p=0.019
-0.133
CI=[-0.348,0.096]
p=0.253
Volume
0.360
CI=[0.147,0.542]
p=0.001
0.305
CI=[0.085,0.496]
p=0.007
0.303
CI=[0.084,0.495]
p=0.008
0.196
CI=[-0.031,0.404]
p=0.089
% Volume
0.666
CI=[0.518,0.775]
p=0.000
0.682
CI=[0.539,0.786]
p=0.000
-0.068
CI=[-0.289,0.160]
p=0.561
0.573
CI=[0.400,0.707]
p=0.000
Vaccine
Weighted
Polarization
0.633
CI=[0.475,0.751]
p=0.000
0.655
CI=[0.505,0.767]
p=0.000
-0.115
CI=[-0.332,0.113]
p=0.322
0.576
CI=[0.403,0.710]
p=0.000
Sum
-0.184
CI=[-0.393,0.044]
p=0.112
-0.196
CI=[-0.403,0.031]
p=0.090
-0.247
CI=[-0.448,-0.023]
p=0.031
-0.119
CI=[-0.335,0.110]
p=0.307
Aggregated
Weighted
Sum
-0.217
CI=[-0.422,0.009]
p=0.060
-0.198
CI=[-0.405,0.029]
p=0.086
-0.193
CI=[-0.401,0.034]
p=0.095
-0.091
CI=[-0.310,0.137]
p=0.434

7.5 The Relationship between National Partisan Polarization and Epidemiological Data

Here, we investigate the aggregated polarization over time for each country and how it relates to the reported number of New Cases and Deaths for COVID-19. In Figure 16, we observe that polarization is not correlated with the severity of the pandemic, in both Canada and the United States. To compute the daily aggregate polarization measure, we employ a weighted sum of each topic’s polarization, considering the percentage of each topic’s tweets within that day’s volume of COVID-19-related tweets.

Refer to caption
(a) Canada: Daily Polarization Trends
Refer to caption
(b) Canada: Polarization v.s. Deaths
Refer to caption
(c) United States: Daily Polarization Trends
Refer to caption
(d) United States: Polarization v.s. Deaths
Figure 16: Daily aggregate polarization v.s. COVID-19 new cases and deaths for (a) Canada and (c) the United States from October \nth9, 2020 to January \nth3, 2021. Correlation of polarization and COVID-19 deaths for (b) Canada and (d) the United States. The correlation coefficient are -0.044 for Canada with CI = [0.251,0.167] (n = 88, p = 0.681) and -0.196 for the United States with CI = [-0.403,0.031] (n = 88, p = 0.090).

8 Methodology Details

In the following section, we report the classification metrics of each module in the pipeline for measuring polarization.

8.1 Classifying Tweets by Topics

Table 4: Tweet Topic Classification Metrics. 200 relevant and 200 irrelevant tweets were sampled for manual annotation. F1-score is calculated on the true labels where both annotators agreed upon over 5 runs with a different random seed.
Canada
Topic Relevant Irrelevant Cohen F1-Score # of Tweets
Lockdown 170 356 0.73 97.13 ± 1.57 1,553,984
Mask 292 282 0.91 98.48 ± 1.24 1,994,293
Vaccine 326 248 0.91 99.84 ± 0.31 2,145,549
Conspiracy 338 171 0.67 97.20 ± 0.71 16,575,934
United States
Topic Relevant Irrelevant Cohen F1-Score # of Tweets
Lockdown 126 199 0.63 100.00 ± 0.00 897,565
Mask 197 200 0.98 99.49 ± 0.62 1,562,706
Vaccine 192 201 0.96 100.00 ± 0.00 1,541,360
Conspiracy 195 155 0.75 95.55 ± 2.39 926,389

8.2 Classifying Users by Geolocation

Table 5: Geolocated Users Number and Correlation. Correlation is done with the official 2021 population census for each country.
Canada
Users Total Correlation
Geolocated 282,454 0.92 (n=13, p=6.20e-06, CI=[0.76, 0.98])
w/ Party Affiliation 195,456 0.92 (n=13, p=6.83e-06, CI=[0.76, 0.98])
United States
Users Total Correlation
Geolocated 757,601 0.98 (n=52, p=9.27e-35, CI=[0.96, 0.99])
w/ Party Affiliation 242,056 0.97 (n=52, p=2.51e-34, CI=[0.96, 0.99])

8.3 Classifying User Party Affiliation

We report the classification metrics for Canada in Table 6 and for the United States in Table 9. For Canada, our model classified users for each parties for the activity as well, but the F1-score was not satisfactory as parties within the liberal (left) party family and conservative (right) party family was easily confused as shown in the confusion matrix in Table 7. Thus, for our analysis, we merged the parties in Canada, and show the confusion matrix for after the merge in Table 8.

Table 6: Canadian User Party Affiliation Classification. Cohen Kappa score of 0.74
Party Support F1-Score # of Users
Profile CPC 98 92.93 ± 1.12 1,769
GPC 60 88.50 ± 1.54 97
LPC 100 90.89 ± 1.34 783
NDP 124 93.44 ± 0.53 370
NO_PARTY 105 86.16 ± 1.97 667
PPC 95 94.09 ± 1.34 402
Activity RPF 628 93.85 ± 0.47 193,225
LPF 357 89.10 ± 0.73 299,836
Combined RPF 196,338
LPF 302,023
Table 7: Canadian User Party Affiliation Activity Classifier Confusion Matrix
CPC PPC GPC LPC NDP
CPC 398 24 1 29 6
PPC 59 49 0 1 1
GPC 3 0 6 10 6
LPC 18 3 2 161 26
NDP 5 1 1 19 58
Table 8: Canadian User Party Affiliation Activity Classifier Confusion Matrix - Liberal (left) and Conservative (right) Party Family
Right Party Family Left Party Family
Right Party Family 530 38
Left Party Family 27 289
Table 9: American User Party Affiliation Classification. Cohen Kappa score of 0.76. Total 763,164 users.
Party Support F1-Score # of Users
Profile Republican 854 97.21 ± 0.66 86,989
Democrat 928 97.40 ± 0.63 82,923
Activity Republican 10,583 92.98 ± 0.18 239,449
Democrat 10,226 92.99 ± 0.16 145,733
Combined Republican 336,231
Democrat 426,933

8.4 Distribution Matching with Election Results

We further validate our party affiliation distribution using the 2019 Canada Federal Election and the US 2020 Election results for the respective country. We calculate the correlation between the ratio of numbers of liberal and conservative families-labelled users per region in our data compared to the election results and obtain strong correlations of 0.815 for Canada visualized in Figure 17 and 0.802 for the United States visualized in Figure 18. We also visualize the ratio for all geolocated users for the respective topics.

Refer to caption
(a) 2019 Election Results
Refer to caption
(b) Geolocated Users
Refer to caption
(c) Lockdown
Refer to caption
(d) Mask
Refer to caption
(e) Vaccine
Refer to caption
(f) Conspiracy
Figure 17: Normalized distribution of the inferred user party affiliations compared to the CAD 2021 election results.
Refer to caption
(a) 2020 Election Results
Refer to caption
(b) Geolocated Users
Refer to caption
(c) Lockdown
Refer to caption
(d) Mask
Refer to caption
(e) Vaccine
Refer to caption
(f) Conspiracy
Figure 18: Empirical distribution of the inferred user party affiliations compared to the US 2020 election results. Interestingly, lockdown matches closer to the election results, mask and vaccine has a higher liberal ratio and conspiracy has a higher conservative ratio.

8.5 Matching Users to US Voter Registration

Following the best practice for evaluating party affiliation predictions [96], we matched Twitter (X) users from our dataset with the primary voter registration records available for five states: Ohio, New York, Florida, Arkansas, North Carolina, as well as Washington DC. From these records, we obtain the party affiliation of unique users in each state by md5-hashing their names and county to construct a key identifier. From our set of geolocated Twitter (X) users, we kept everyone that belonged to one of the five states or DC, and we removed those whose location could not be retrieved. Finally, we matched the most recent voter party affiliation records from the registration data to the unique Twitter (X) users that matched both the county and either the first name and last name or the first, middle and last name. We pre-processed the user’s name on Twitter (X) to remove emojis. After matching, we removed users not affiliated with either of the two major parties and users whose name matched with more than one record per county (indicating a non-unique match).

We then compare users’ party from voter registration with their predicted party, first using their profile description and second their COVID-19-related tweets. Using our geolocation and voter record matching, Table 10 shows we are able match a significant number of users, more than 30k, across the 5 states and DC with their voter records.

Table 10: Users matched to their voter registration. Voters* and Users* respectively correspond to the number of unique voters in the records and unique Twitter (X) users in our data.
State Voters* Users* Matched Democrat Republican Other
Ohio 7,771,590 4,913 1,431 320 193 917
New York 17,718,437 30,927 8,255 4,843 1,631 1,781
Florida 14,477,882 50,541 12,905 5,585 4,508 2,810
Arkansas 1,722,465 4,311 1,280 145 140 995
District of Columbia 510,026 17,661 2,538 1,929 153 456
North Carolina 8,004,814 20,761 6,050 2,450 1,655 1,945
Total 32,456 15,272 8,280 8,904

We get an accuracy of 74.35% for the first method and 73.35% for the second one. We note that both methods are binary classifiers while users can also be independent or support a third party, despite an ideology (and behavior/voting) that aligns strongly with one of the two main parties. They can also have an outdated registration that no longer reflects the beliefs they currently hold and express on Twitter (X). Therefore, although accuracy here is lower, it still indicates our classification is acceptable and on par with the standard for this type of evaluation [96].

8.6 Approximation of Polarization

We test accuracy of approximation in Algorithm 1 over the daily Lockdown and Vaccine tweets.

In Figure 19(a), we plot the total number of users against the absolute error (and its standard deviation) of the approximated poli𝑝𝑜𝑙𝑖poliitalic_p italic_o italic_l italic_i compared to the exact value, binned for every 1,000 users. We observe a dramatic drop of the absolute error term at around 3,000 users. When we reach the 10,000 users value, the absolute error is usually below 0.001.

In Figure 19(b), we plot the total number of users against the time saved in running the approximation algorithm compared to running the exact poli𝑝𝑜𝑙𝑖poliitalic_p italic_o italic_l italic_i, binned also for every 1,000 users. We observe that the time saved is exponential to the number of users. We note that at around 50,000 users, the approximation rarely needs to increase the fraction of users sampled.

These findings confirm that we can accurately approximate poli𝑝𝑜𝑙𝑖poliitalic_p italic_o italic_l italic_i for large-scale data that is impossible to measure exactly because of memory constraints. As poli𝑝𝑜𝑙𝑖poliitalic_p italic_o italic_l italic_i relies heavily on finding pairwise distances (time and memory intensive), we see from our analysis how sampling can save both time and memory exponentially.

Refer to caption
(a) Approximation Quality
Refer to caption
(b) Time Saved in Seconds
Figure 19: For large enough data, the approximation error is close to 0.001. Our approximation of polarization also exponentially saves time as the number of instances grows.

We also explore the impact of changing the minimum coefficient of variation. We start with a minimum sample size of 1% or fraction=0.01𝑓𝑟𝑎𝑐𝑡𝑖𝑜𝑛0.01fraction=0.01italic_f italic_r italic_a italic_c italic_t italic_i italic_o italic_n = 0.01. We keep the step_size constant at 0.01. For the first experiment, repeats𝑟𝑒𝑝𝑒𝑎𝑡𝑠repeatsitalic_r italic_e italic_p italic_e italic_a italic_t italic_s is set to 10101010. For the second experiment, shown here in figures 20 and 21, epsilon𝑒𝑝𝑠𝑖𝑙𝑜𝑛epsilonitalic_e italic_p italic_s italic_i italic_l italic_o italic_n is set to 0.050.050.050.05.

Refer to caption
Refer to caption
Figure 20: Approximation of poli𝑝𝑜𝑙𝑖poliitalic_p italic_o italic_l italic_i. Green dots mean time was saved while red means time lost. The circle size is the absolute error times 1000. The inner plot zooms in and doubles the circle sizes for clarity. We observe that the fraction of instances needed increase as we decrease the needed minimum coefficient of variation. This follows our intuition as a larger sample of data is much easier to approximate the full data. We also that at lower minimum coefficient of variation, the less error we have (circles are much smaller). On all plots, we also see that we always save time (green circles) if there are more than 10,000 instances.
Refer to caption
Refer to caption
Figure 21: Approximation of poli𝑝𝑜𝑙𝑖poliitalic_p italic_o italic_l italic_i. As in the preceding figure, we observe that as we increase the number of repeats, the less error we have in our approximation (smaller circles). Likewise the amount of time saved decreases as we increase the repeat count (amount of red circles increase as repeats increase). We also observe a general slight increase in the fraction of instances needed as we increase the repeats.