Measuring Information Disorder

.title[
# Measuring Information Disorder
]
.author[
### Francesco Bailo
]
.institute[
### University of Sydney
]
.date[
### updated: 2023-11-27<br><br>Australian Social Network Analysis Conference<br>University of New South Wales<br>27–28 November, 2023
]

---

background-image: url(https://upload.wikimedia.org/wikipedia/en/6/6a/Logo_of_the_University_of_Sydney.svg)
background-size: 95%

---

# Information disorder

---

## Why we need a practical definition of information disorder?

The battle against misinformation is fundamentally impossible to control and win.

1. Content distribution platforms (i.e. social media) are

* geared to increase volumes, not to foster quality. With limited exceptions, news media play by the same rules.
    
    * not resilient to *epistemic gaps* -- which are regularly occurring as we always experience a gap between events (e.g. COVID-19) and knowledge (e.g. interpretations and explanations) about the events.

2. *Information chaos* is a well-defined and recurrently used political and geopolitical tactic. Information chaos is not about misinformation/disinformation but about the diffusion into an information space of more information than people can navigate. This result in generalised mistrust.

3. Recent advances in LLM have provided new sophisticated functionality in creating even more credible content (text, video, audio).

---

# What is information?

---

#### Information as truth: Floridi, L. (2010). *Information: A very short introduction*. Oxford University Press.

<div class="grViz html-widget html-fill-item-overflow-hidden html-fill-item" id="htmlwidget-5702870d59e60cbcbc59" style="width:60%;height:504px;"></div>
<script type="application/json" data-for="htmlwidget-5702870d59e60cbcbc59">{"x":{"diagram":"\ndigraph {\n\n  # graph attributes\n  graph [overlap = true]\n\n  # node attributes\n  node [shape = box,\n        fontname = Helvetica,\n        color = black]\n\n  # edge attributes\n  edge [color = gray]\n\n  # node statements\n  A [label = \"data\n(unstructured)\"]; \n  B [label = \"environmental\"]; \n  C [label = \"semantic\n(content)\"]; \n  D [label = \"instructional\"];\n  E [label = \"factual\"];\n  F [label = \"untrue\"];\n  G [label = \"true\n(information)\", style = filled, fillcolor = \"orange\"]\n  H [label = \"knowledge\"]\n  I [label = \"unintentional\n(misinformation)\"]\n  J [label = \"intentional\n(disinformation)\"]\n\n  # edge statements\n  A->B; A->C; C->D; C->E;\n  E->F; E->G; G->H;\n  F->I; F->J;\n  \n}\n","config":{"engine":"dot","options":null}},"evals":[],"jsHooks":[]}</script>

---

#### But ...

Information as truth is problematic, especially when it comes to social media content.

* Truth is difficult/impossible to ascertain (e.g. fact-checking is laborious and available only hours or even days after the content has been distributed);

* Truth might not be available for lack of scientific consensus (e.g. onset of the Covid-19 pandemic)

---

## A practical definition of information

**Information** is content that is meaningful in a semantic sense (i.e. we can understand its meaning).

So this is **not information**: 010101110001010101010100101

But this is **information**, no matter where you stand on the connection vaccine-autism:

.center[<blockquote class="twitter-tweet"><p lang="en" dir="ltr">Sr CDC vaccine scientist: Thimerosal in flu shots given to pregnant mothers &quot;causes autism-like features&quot; in children<br>http://t.co/OBCOnDAoKV</p>&mdash; Robert F. Kennedy Jr (@RobertKennedyJr) <a href="https://twitter.com/RobertKennedyJr/status/531912864555347968?ref_src=twsrc%5Etfw">November 10, 2014</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>]

---

## A practical definition of information

<div class="grViz html-widget html-fill-item-overflow-hidden html-fill-item" id="htmlwidget-944474421fcbef021782" style="width:504px;height:504px;"></div>
<script type="application/json" data-for="htmlwidget-944474421fcbef021782">{"x":{"diagram":"\ndigraph {\n\n  # graph attributes\n  graph [overlap = true]\n\n  # node attributes\n  node [shape = box,\n        fontname = Helvetica,\n        color = black]\n\n  # edge attributes\n  edge [color = gray]\n\n  # node statements\n  A [label = \"data\n(unstructured)\"]; \n  B [label = \"environmental\"]; \n  C [label = \"semantic\n(content)\n(information)\", style = filled, fillcolor = \"orange\"]; \n  D [label = \"instructional\"];\n  E [label = \"factual\"];\n  F [label = \"untrue\"];\n  G [label = \"true\"]\n  H [label = \"knowledge\"]\n  I [label = \"unintentional\n(misinformation)\"]\n  J [label = \"intentional\n(disinformation)\"]\n\n  # edge statements\n  A->B; A->C; C->D; C->E;\n  E->F; E->G; G->H;\n  F->I; F->J;\n  \n}\n","config":{"engine":"dot","options":null}},"evals":[],"jsHooks":[]}</script>

---

## A practical definition of information disorder

* As information, Information disorder is *agnostic to knowledge* (true belief).

So, information (dis)order is

> The likelihood of encountering (in)consistent information to solve a sense-making problem.

In other words, information (dis)order is a measure of the overall *coherence* of an information flow.

For example,

* Do I need to wear a mask?

* Do I need to vaccinate my kids?

* Do I need to evacuate?

---

# How to measure information disorder

---

### How to measure information disorder: A network approach

Information coherence is measured from a *multilayer network*, where nodes (actors) are *statements* related by four types of edges.

Statements are linked:

1. if they agree (first layer);

2. if they disagree (second layer);

3. if they are equivalent/redundant (third layer);

4. and finally, if they offer to the same external source (URLs) (fourth layer).

---

#### How to measure information disorder: A network approach

]

---

### How to measure information disorder: A network approach

#### Order vs Disorder

We can calculate the *density* `$D$` of each graph, then solve for this:

`$$(D_{Agreement} - D_{Disagrement}) + D_{Redundancy} + D_{ExternalSource}$$`

---

### How to measure information disorder: A network approach

#### Pluralism vs Singularism

`$$Agreement + Redundancy + ExternalSource$$`

I can flatten these three graphs to get a more comphrensive agreement graph (including *redundacy* and shared *external sources*).

---

### How to measure information disorder: A network approach

#### Pluralism vs Singularism

]

A pluralism vs singularism index can be measured as the normalised number of communities in the flatten agreement graph

---

### The order vs disorder / singularism vs pluralism chart

---

# How to measure information disorder (in practice)

---

## 1. Define an information space

For example, I am currently working on the conversation around mask use during the COVID-19 epidemic (Feb 2020 - Jan 2021) on the English version of Wikipedia

## 2. Define time intervals

For example, I am using 24-hour intervals. That is, I am producing an estimate of **Information Pluralism** and **Information Disorder** for each 24-hour window.

## 3. Sample content from each interval

How much content will depend on the classification capabilities. Nodes (i.e. "content") can be a sentence, a social media post or an article. With Wikipedia I am selecting sentences.

---

## 4. Code the relationships among nodes using LLM (e.g. ChatGPT): *Agreement*, *Disagreement*, *Redundancy* graphs

> prompt = f"Responding with one word, the information that is possible to extract from the two texts appears to be in 'agreement', 'disagreement' or equivalent' (please respond using only one of these four labels: agreement, disagreement, equivalent, none)\n'text 1: '{node_x}'\ntext 2: '{node_y}'"

## 5. Extract and compare URL from content: *ExternalSource* graph

---

## Critical normative questions we can answer with information from this measurement:

1. Where should conversations be? (More or less tollerance for pluralism/disorder depending on the topic), and

2. How should conversations evolve?