3rd UNSW RESILIENT DEMOCRACY LAB WORKSHOP
2026-02-27
I would like to acknowledge the Traditional Owners of Australia and recognise their continuing connection to land, water and culture. The University of Sydney is located on the land of the Gadigal people of the Eora Nation. I pay my respects to their Elders, past and present.
Citizens face immersion in environments where coherent understanding becomes impossible
Key distinction: Healthy pluralism → Chaotic pluralism
Framework: Information environments as networks of semantic relationships
Information Disorder Measure:
\[ D = \frac{|E^-|}{|E^+| + |E^-| + |E^0|} \]
Where \(E^+\) = agreement, \(E^-\) = disagreement, \(E^0\) = independence
Ranges from 0 (no disagreement) to 1 (complete disagreement)
Supra-infon: Anchoring claim outside immediate information space
“The election has been administered fairly so far”
Source: Facebook via CrowdTangle API (🪦 RIP)
Account types:
Coverage:
Time series of posting activity
Time series of posting activity
flowchart TD
A["Raw Data<br>(~1M Posts)"] --> B["Stage 1: Binary Classification<br>(Sample 1K posts, 10 LLMs)"]
B --> C["Stage 2: Intercoder Reliability<br>(Large vs Small Models)"]
C --> D["Stage 3: ML Classifier (~1M Posts)<br>(Train on LLM labels)"]
D --> E["Stage 4: Relationship Classif. (Sample ~100k pairs, 3 LLMs)<br>(Agreement/Disagreement)<br>on sample"]
B --> F["Classification Task<br>Is the post about the election?"]
D --> F
E --> G["Classification Task<br>Do the statements agree?"]
Task: Identify election-related posts using LLM ensemble
Input
Output
| Model | Parameters |
|---|---|
| gemma3:27b | 27B |
| llama4:scout | ~17B |
| gpt-oss:20b | 20B |
| deepseek-r1:32b | 32B |
| qwen3:30b | 30B |
| Model | Parameters |
|---|---|
| phi3:3.8b | 3.8B |
| qwen3:0.6b | 0.6B |
| deepseek-r1:1.5b | 1.5B |
| llama3.2:1b | 1B |
| gemma3:1b | 1B |
| Model | Size | Errors | Error Rate |
|---|---|---|---|
| deepseek-r1:1.5b | Small | 3 | 0.3% |
| phi3:3.8b | Small | 0 | 0% |
| qwen3:0.6b | Small | 0 | 0% |
| llama3.2:1b | Small | 0 | 0% |
| gemma3:1b | Small | 0 | 0% |
| deepseek-r1:32b | Large | 0 | 0% |
| llama4:scout | Large | 0 | 0% |
| qwen3:30b | Large | 0 | 0% |
| gpt-oss:20b | Large | 1 | 0.1% |
| gemma3:27b | Large | 0 | 0% |
Purpose: Validate LLM annotations by measuring agreement between models
Key Question
Do large models agree with each other more than small models?
| Metric | Large_Models | Small_Models |
|---|---|---|
| Number of models | 5.000 | 5.000 |
| Mean error rate (%) | 0.000 | 0.100 |
| Mean % classified as election | 21.000 | 48.900 |
| Fleiss’ Kappa | 0.846 | -0.006 |
| Krippendorff’s Alpha | 0.846 | -0.007 |
| Mean pairwise Cohen’s Kappa | 0.848 | 0.174 |
| Agreement with large-model majority (%) | 97.000 | 64.000 |
| Model | % Election | Errors |
|---|---|---|
| deepseek-r1:1.5b | 13.8 | 3 |
| phi3:3.8b | 16.1 | 0 |
| qwen3:0.6b | 29.3 | 0 |
| llama3.2:1b | 89.6 | 0 |
| gemma3:1b | 95.9 | 0 |
Critical Issue
Small models (llama3.2:1b, gemma3:1b) classified 89-96% of posts as election-related, indicating they cannot discriminate between election and non-election content.
Decision: Use only large model majority for ground truth.
Purpose: Train traditional ML to scale classification without LLM inference cost
majority_large column| Model | Class Balancing |
|---|---|
| Logistic Regression | class_weight='balanced' |
| Random Forest | class_weight='balanced' |
| Gradient Boosting | Sample weights |
Best Model
Selected based on ROC-AUC score on held-out test set (20%)
| Metric | Value |
|---|---|
| Model Type | Logistic Regression |
| Accuracy | 0.945 |
| ROC-AUC | 0.957 |
| Training Samples | 800 |
| Test Samples | 200 |
| Metric | Count |
|---|---|
| Total posts classified | 919,582 |
| Election-related | 179,281 (19.5%) |
| Not election-related | 740,301 (80.5%) |
Goal: Create comparable samples across partisan information environments
Window creation: Group posts into 3-hour windows
Weighted sampling: Up to 20 posts per window, weighted by engagement
Three environments: Democrat, Republican, General pages
| Choice | Reason |
|---|---|
| 3-hour windows | Temporal granularity with sufficient posts |
| Max 20 posts | Limits pairs: \(\binom{20}{2} = 190\) |
| Share-weighted | Prioritizes high-reach content |
Note
Partisan classification based on CrowdTangle list titles containing “democrat” or “republican”
Task: Classify semantic relationships between post pairs
Co-informative claims that support each other
Contradictory claims that cannot both be true
Unrelated claims with no bearing on each other
You are an Information Analyst classifying the semantic relationship between two discrete items of information (infons) about the 2020 US presidential election.
CONTEXT: The 2020 US Presidential Election…
CLASSIFICATION CATEGORIES:
AGREEMENT: The infons are co-informative…
DISAGREEMENT: The infons are inconsistent or contradictory…
INDEPENDENCE: The infons are logically unrelated…
EXAMPLES: Infon A: “Poll workers were excluded and couldn’t observe the count” Infon B: “The election has been administered fairly so far” Classification: DISAGREEMENT Reason: Excluding observers implies unfair administration; these claims cannot both be true
| party | Total Pairs | Post-Post | Post-Supra | % Agreement | % Disagreement | % Independent |
|---|---|---|---|---|---|---|
| democrat | 20915 | 18415 | 2500 | 22.4 | 1.7 | 75.9 |
| general | 19259 | 16882 | 2377 | 17.6 | 3.7 | 78.8 |
| republican | 62160 | 56240 | 5920 | 8.1 | 3.0 | 88.9 |
Information disorder peaks (~40%) in the general conversation one day post-election, coinciding with a surge in Republican posting activity and challenges to election fairness (75–100% disagreement with supra-infon).
Two days after the election—and persisting beyond the November 7 call—posts from general (civil society) accounts contesting election fairness remain elevated, stabilizing above 75%.
| Sample | Outcome | Coefficient (log-odds) | 95% CI | p-value | Sig |
|---|---|---|---|---|---|
| General | Disagreement | 0.190 | [0.149, 0.230] | 0.0000 | *** |
| General | Agreement | 0.014 | [-0.045, 0.072] | 0.6462 | NA |
| Democrat | Disagreement | 0.169 | [0.075, 0.262] | 0.0004 | *** |
| Democrat | Agreement | 0.029 | [-0.048, 0.106] | 0.4571 | NA |
| Republican | Disagreement | 0.226 | [0.167, 0.285] | 0.0000 | *** |
| Republican | Agreement | 0.134 | [0.037, 0.229] | 0.0061 | ** |
Key Finding
Disagreement is amplified across all three environments — posts challenging election fairness receive significantly more shares than those affirming it.