Information disorder on Facebook around the 2020 US Presidential election

Francesco Bailo (University of Sydney) - also with Justin Miller (University of Sydney), Rohan Alexander (University of Toronto)

3rd UNSW RESILIENT DEMOCRACY LAB WORKSHOP

2026-02-27

Acknowledgement of Country

I would like to acknowledge the Traditional Owners of Australia and recognise their continuing connection to land, water and culture. The University of Sydney is located on the land of the Gadigal people of the Eora Nation. I pay my respects to their Elders, past and present.

The Challenge: Information Disorder in Democracy

Citizens face immersion in environments where coherent understanding becomes impossible

  • Traditional fact-checking approach has fundamental limitations:
    • Requires establishing contested ground truths
    • Doesn’t scale across millions of posts
    • Misses how contradictory legitimate perspectives undermine sense-making

Key distinction: Healthy pluralism → Chaotic pluralism

  • Citizens possess epistemic rights to sufficient information AND competence to navigate information systems
  • When undermined, consequences extend beyond confusion to degradation of democratic deliberation

Our Contribution: Measuring Disorder Without Adjudicating Truth

Framework: Information environments as networks of semantic relationships

  • Infons (Devlin, 1991; Floridi, 2011): discrete, meaningful units of information (individual posts)
  • Relationships between infons: agreement, disagreement, or independence
  • No reference to ground truth needed - assess mutual support/contradiction

Information Disorder Measure:

\[ D = \frac{|E^-|}{|E^+| + |E^-| + |E^0|} \]

Where \(E^+\) = agreement, \(E^-\) = disagreement, \(E^0\) = independence

Ranges from 0 (no disagreement) to 1 (complete disagreement)

Methodological Innovation: Supra-Infon

Supra-infon: Anchoring claim outside immediate information space

“The election has been administered fairly so far”

  • Each infon classified as agreeing, disagreeing, or independent with respect to this reference
  • Enables measurement of discourse alignment with specific positions
  • Still without adjudicating truth value

Data Collection

Source: Facebook via CrowdTangle API (🪦 RIP)

Account types:

  • 749 curated lists (Local News, Politics, Metro groups, etc.)
  • Republican and Democrat officials, state parties, PACs

Coverage:

  • 969,207 posts from 38,149 accounts
  • October 26 - December 1, 2020
  • With Election Day on 3 Nov and called for Biden on 7 Nov.

Time series of posting activity

Election Day 0 1,000 2,000 3,000 4,000 5,000 Oct 26 Nov 01 Nov 08 Nov 15 Nov 22 Nov 29 Date Posts per Hour October 26 - December 1, 2020 Hourly Posting Activity Over Time Total posts: 969,207 | Times in ET

Time series of posting activity

Overview

Pipeline Overview

flowchart TD
    A["Raw Data<br>(~1M Posts)"] --> B["Stage 1: Binary Classification<br>(Sample 1K posts, 10 LLMs)"]
    B --> C["Stage 2: Intercoder Reliability<br>(Large vs Small Models)"]
    C --> D["Stage 3: ML Classifier (~1M Posts)<br>(Train on LLM labels)"]
    D --> E["Stage 4: Relationship Classif. (Sample ~100k pairs, 3 LLMs)<br>(Agreement/Disagreement)<br>on sample"]
    B --> F["Classification Task<br>Is the post about the election?"]
    D --> F
    E --> G["Classification Task<br>Do the statements agree?"]
    

Stage 1: Binary Classification

Stage 1: Binary Classification

Task: Identify election-related posts using LLM ensemble

Input

  • Random sample: 1000 posts

Output

  • Each post labeled by 10 models
  • Labels: 0 (not election), 1 (election), -1 (error)

Models Used

Large Models (20-32B)

Model Parameters
gemma3:27b 27B
llama4:scout ~17B
gpt-oss:20b 20B
deepseek-r1:32b 32B
qwen3:30b 30B

Small Models (0.6-3.8B)

Model Parameters
phi3:3.8b 3.8B
qwen3:0.6b 0.6B
deepseek-r1:1.5b 1.5B
llama3.2:1b 1B
gemma3:1b 1B

Classification Results by Model

Error Rates by Model

Model Size Errors Error Rate
deepseek-r1:1.5b Small 3 0.3%
phi3:3.8b Small 0 0%
qwen3:0.6b Small 0 0%
llama3.2:1b Small 0 0%
gemma3:1b Small 0 0%
deepseek-r1:32b Large 0 0%
llama4:scout Large 0 0%
qwen3:30b Large 0 0%
gpt-oss:20b Large 1 0.1%
gemma3:27b Large 0 0%

Stage 2: Intercoder Reliability

Stage 2: Intercoder Reliability

Purpose: Validate LLM annotations by measuring agreement between models

Key Question

Do large models agree with each other more than small models?

Reliability Metrics

Metric Large_Models Small_Models
Number of models 5.000 5.000
Mean error rate (%) 0.000 0.100
Mean % classified as election 21.000 48.900
Fleiss’ Kappa 0.846 -0.006
Krippendorff’s Alpha 0.846 -0.007
Mean pairwise Cohen’s Kappa 0.848 0.174
Agreement with large-model majority (%) 97.000 64.000

Pairwise Cohen’s Kappa

Key Finding: Small Models Unreliable

Model % Election Errors
deepseek-r1:1.5b 13.8 3
phi3:3.8b 16.1 0
qwen3:0.6b 29.3 0
llama3.2:1b 89.6 0
gemma3:1b 95.9 0

Critical Issue

Small models (llama3.2:1b, gemma3:1b) classified 89-96% of posts as election-related, indicating they cannot discriminate between election and non-election content.

Decision: Use only large model majority for ground truth.

Stage 3: ML Classifier

Stage 3: ML Classifier Training

Purpose: Train traditional ML to scale classification without LLM inference cost

Ground Truth

  • majority_large column
  • Consensus of 5 large LLMs

Features

  • TF-IDF vectors
  • 5,000 features
  • 1-2 ngrams

Models Evaluated

Model Class Balancing
Logistic Regression class_weight='balanced'
Random Forest class_weight='balanced'
Gradient Boosting Sample weights

Best Model

Selected based on ROC-AUC score on held-out test set (20%)

Performance Metrics

Metric Value
Model Type Logistic Regression
Accuracy 0.945
ROC-AUC 0.957
Training Samples 800
Test Samples 200

Outcome: Classification of posts using logistic regression

Metric Count
Total posts classified 919,582
Election-related 179,281 (19.5%)
Not election-related 740,301 (80.5%)

Stage 4: Relationship Classification

Temporal Window Sampling

Goal: Create comparable samples across partisan information environments

Strategy

  1. Window creation: Group posts into 3-hour windows

  2. Weighted sampling: Up to 20 posts per window, weighted by engagement

  3. Three environments: Democrat, Republican, General pages

Rationale

Choice Reason
3-hour windows Temporal granularity with sufficient posts
Max 20 posts Limits pairs: \(\binom{20}{2} = 190\)
Share-weighted Prioritizes high-reach content

Note

Partisan classification based on CrowdTangle list titles containing “democrat” or “republican”

Sample 3-hour window with max 20 posts

Stage 4: Infon Relationship Classification

Task: Classify semantic relationships between post pairs

AGREEMENT

Co-informative claims that support each other

DISAGREEMENT

Contradictory claims that cannot both be true

INDEPENDENCE

Unrelated claims with no bearing on each other

We used these three large LLMs and then majority vote

  1. gemma3:27b, 27B
  2. llama4:scout, ~17B
  3. gpt-oss:20b, 20B

Pair Types

Post-Post Pairs

  • All pairwise combinations
  • Within 3-hour windows
  • Measures internal coherence

Post-Supra Pairs

  • Each post vs reference statement
  • Supra-infon: “The election has been administered fairly so far”
  • Measures alignment with neutral anchor

Classification Prompt

You are an Information Analyst classifying the semantic relationship between two discrete items of information (infons) about the 2020 US presidential election.

CONTEXT: The 2020 US Presidential Election…

CLASSIFICATION CATEGORIES:

AGREEMENT: The infons are co-informative…

DISAGREEMENT: The infons are inconsistent or contradictory…

INDEPENDENCE: The infons are logically unrelated…

EXAMPLES: Infon A: “Poll workers were excluded and couldn’t observe the count” Infon B: “The election has been administered fairly so far” Classification: DISAGREEMENT Reason: Excluding observers implies unfair administration; these claims cannot both be true

Classification Results

party Total Pairs Post-Post Post-Supra % Agreement % Disagreement % Independent
democrat 20915 18415 2500 22.4 1.7 75.9
general 19259 16882 2377 17.6 3.7 78.8
republican 62160 56240 5920 8.1 3.0 88.9

Relationship Distribution

Preliminary Findings

Key Finding 1: Information disorder in the post-election

  • Information disorder peaks (~40%) in the general conversation one day post-election, coinciding with a surge in Republican posting activity and challenges to election fairness (75–100% disagreement with supra-infon).

  • Two days after the election—and persisting beyond the November 7 call—posts from general (civil society) accounts contesting election fairness remain elevated, stabilizing above 75%.

Key Finding 2: Role of algorithmic amplification

  • Does a post share count predict if the post agrees or disagrees with the statement tha the election is fair?
Sample Outcome Coefficient (log-odds) 95% CI p-value Sig
General Disagreement 0.190 [0.149, 0.230] 0.0000 ***
General Agreement 0.014 [-0.045, 0.072] 0.6462 NA
Democrat Disagreement 0.169 [0.075, 0.262] 0.0004 ***
Democrat Agreement 0.029 [-0.048, 0.106] 0.4571 NA
Republican Disagreement 0.226 [0.167, 0.285] 0.0000 ***
Republican Agreement 0.134 [0.037, 0.229] 0.0061 **

Key Finding

Disagreement is amplified across all three environments — posts challenging election fairness receive significantly more shares than those affirming it.

Algorithmic Amplification: Interpretation

Agreement Disagreement General Democrat Republican General Democrat Republican 0.0 0.1 0.2 0.3 Sample Coefficient (log-odds) Sample General Democrat Republican Separate panels for Agreement and Disagreement Effect of Share Count on Stance Toward Election Fairness

Agreement (Left Panel)

  • Effects not significant for General and Democrat samples (CIs cross zero)
  • Small positive effect for Republican (β = 0.134, p < .01)

Disagreement (Right Panel)

  • Significant amplification in all three environments (p < .001)
  • Strongest effect in Republican sample (β = 0.226)
  • General and Democrat similar (~0.17–0.19)

References

Devlin, K.J. (1991). Logic and Information. Cambridge University Press.
Floridi, L. (2011). The philosophy of information. Oxford: Oxford University Press.