Knight Research Network Tool Demonstration Day

October 13, 2021

Center for Informed Democracy & Social - cybersecurity (IDeaS) › Events › Knight Research Network Tool Demonstration Day

The Knight Research Network (KRN) was created in 2019 when the John S. and James L. Knight Foundation invested in centers and projects with the goal of "identifying how society can respond to the ways in which digital technology has revolutionized the production, dissemination and consumption of information" ()

Digital tools developed by members of KRN and our friends at other institutions further this goal by making it easier for researchers to gather and sort data, bring data-driven information to the public in innovative, visual formats, fact check news and other information, and detect bots and trolls online.

The KRN Tool Demonstration Day is a free, virtual event, open to the public.

October 13, 2021
10:30-3:00pm

Available Recordings and Links

Github and other links shared during the event can be found on this document.

Data Collection, Transformation, or Aggregation	Data Visualizers	Detection	Fact Checking/Verifiers

Schedule

The schedule takes place on two concurrent tracks, beginning at 10:30am in Track 1. See full schedule for both tracks below. Attendees will be able to move between the two tracks to see different demonstrations.

Track 1 10:30-2:00

Theme		Start Time	End Time	Tool	Demonstrator
Opening Remarks		10:30	11:00	Welcome	Kathleen M. Carley
Detection	1	11:00 AM	11:15 AM	Botometer	Kai-Cheng Yang
	2	11:15 AM	11:30 AM	BotBuster and BotHunter	Lynnette Ng
	3	11:30 AM	11:45 AM	BotSlayer	Pik-Mai Hui
	4	11:45 AM	12:00 PM	Hate Speech Detection; TrollHunter	Joshua Uyheng
		12:00 PM	12:15 PM	BREAK
		12:15 PM	12:30 PM	BREAK
Fact checking/ verifiers	5	12:30 PM	12:45 PM	StoryGraph	Alexander Nwala
	6	12:45 PM	1:00 PM	Hoaxy	Christopher Torres
	7	1:00 PM	1:15 PM	CoVerifi COVID-19 news verification system	Nikhil Kolluri
	8	1:15 PM	1:30 PM	ExFacto	Anubrata Das
	9	1:30 PM	1:45 PM	FBAdTracker	Ujun Jeong
	10	1:45 PM	2:00 PM	The Lumen Database	Adam Holland

Track 2 11:00-2:45

Theme		Start Time	End Time	Tool	Demonstrator

Data Collection/ transformation/ aggregation	1	11:00 AM	11:15 AM	youtube-data-api	Megan Brown
	2	11:15 AM	11:30 AM	smaberta	Megan Brown
	3	11:30 AM	11:45 AM	urlExpander	Megan Brown
	4	11:45 AM	12:00 PM	Quiz Creator	Jessica Collier
	5	12:00 PM	12:15 PM	EXPO2O	Bohan Jiang
	6	12:15 PM	12:22 PM	Twitter V2 Conversation and Timeline Collector and v2 to v1 Tweet Converter	Isabel Murdock
	7	12:22 PM	12:30 PM	Image emotion classifier	Lynnette Ng
	8	12:30 PM	12:38 PM	Hashtag & URL Coordination	Tom Magelinski
		12:38 PM	12:45 PM	BREAK
		12:45 PM	1:00 PM	BREAK
Data Visualizers	9	1:00 PM	1:15 PM	ORA	Kathleen M. Carley
	10	1:15 PM	1:30 PM	NetMapper	Kathleen M. Carley
	11	1:30 PM	1:45 PM	PIEGraph	Deen Freelon
	12	1:45 PM	2:00 PM	CoVaxxy	Matthew DeVerna
	13	2:00 PM	2:15 PM	Twitter Simulation in Construct	Stephen Dipple
	14	2:15 PM	2:30 PM	CauseBox	Paras Sheth
Closing Remarks		2:30 PM	2:45 PM

Our Collaborators

The Knight Research Network Tool Demonstration Day is planned and organized by KRN member centers һ�� IDeaS; ; ; and

Program

The program listed on this page is available as a pdf document.

Download KRN Tool Demo Day Program

INFORMATION ON TOOLS

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

Botometer

Kai-Cheng Yang

Pik-Mai Hui

Christopher Torres

Alexander Nwala

Matthew DeVerna

John Bryden

Filippo Menczer

Indiana University Observatory on Social Media

Tweet data from Twitter’s API.

Bot scores ranging from 0 to 1.

The tool has a website and API endpoint. The website is freely available to the public given that the user has a Twitter account. The API endpoint is free to use with a limited quota given that the user has a Twitter developer account. The API endpoint also has paid plans that allow users to make more requests.

Contact: yangkc@iu.edu

Description: Botometer checks the activity of a Twitter account and gives it a score. Higher scores mean more bot-like activity.

Tool	URL	Demonstrator	Company or Center	Input	Output	Free or purchase
BotBuster	forthcoming	Lynnette Ng	һ�� IDeaS/CASOS	Twitter V1, V2 JSON; Reddit JSON	CSV	Not currently available for public use

Contact: lynnetteng@cmu.edu

Description: BotBuster uses a mixture-of-experts approach to bot detection algorithm. This approach deals with incomplete data due to data collection limitations or account suspension. Each input (e.g. username, screen name, text) is trained individually with specific treatments to their quirks and separate predictions are performed corresponding to the available information. The predictions are then combined in a gating network to output a bot probability.

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

BotHunter v1

David Beskow

һ�� IDeaS/CASOS

Twitter JSON, v1

CSV

Free, limited access

Contact: info@netanomics.com

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

BotHunter v2

Forthcoming

Netanomics

һ�� IDeaS/CASOS

Twitter JSON, v1 & v2

CSV

purchase/educational discount

Contact: info@netanomics.com

Description: BotHunter - A tiered Approach to Detection and Characterizing Automated Activity on Twitter.

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

BotSlayer

Pik-Mai Hui, Kai-Cheng Yang, Christopher Torres, Alexander Nwala, Matthew DeVerna, John Bryden, Filippo Menczer

Indiana University Observatory on Social Media

A user-generated query and a Twitter API key to collect tweets

Dashboard that ranks likely malicious entities and provides various statistics and visualization.

Free

Contact: huip@iu.edu

Description: BotSlayer is an application that helps track and detect potential manipulation of information spreading on Twitter. BotSlayer uses an anomaly detection algorithm to flag hashtags, links, accounts, and media that are trending and amplified in a coordinated fashion by likely bots. A Web dashboard lets users explore the tweets and accounts associated with suspicious campaigns via Twitter, visualize their spread via Hoaxy, and search related images and content on Google.

Tool	URL	Demonstrator(s)	Company or Center	Input	Output	Free or purchase
CauseBox		Paras Sheth, Ujun Jeong, Ruocheng Guo, Huan Liu, K. Selcuk Candan	Data Mining and Machine Learning Lab, Arizona State University	Treatment Effect Estimation model and Benchmark Data	Evaluation measures like PEHE, Policy Risk, error on ATE, etc.	The tool is free for use

Contact: psheth5@asu.edu

Description: CauseBox is a unified platform meant to serve as a benchmark for an ensemble of machine learning and deep learning based treatment effect estimation methods. It allows users to run and compare seven state of the art treatment effect estimation methods against benchmark datasets widely accepted in the causal inference literature. This tool is helpful for researchers who want to compare their own methods against benchmark methods. CauseBox supports GUI as well as command line interface.

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

CoVaxxy

Matthew DeVerna, Kai-Cheng Yang, Pik-Mai Hui, Christopher Torres, Alexander Nwala, John Bryden, Filippo Menczer

Indiana University Observatory on Social Media

Tweets related to COVID-19 vaccines collected in real-time (using the Twitter API’s filtered streaming endpoint), since January 4^th, 2020.

An interactive web-based data visualization dashboard. Pictures (.png) of visualizations. Tweet IDs for academic research via rehydration

Free

Contact: mdeverna@iu.edu

Description: CoVaxxy is a web-based data visualization dashboard that allows users to concurrently explore the relationship between COVID-19 vaccine conversations, vaccine uptake, and epidemic trends in the United States. The dashboard tracks and quantifies credible information and misinformation narratives over time, as well as their sources and related popular keywords. Furthermore, vaccine uptake and conversation statistics are visualized geographically at the U.S. state-level. The dashboard is updated daily and the data that the dashboard utilizes is made publicly available for others to rehydrate via the Twitter API.

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

CoVerifi COVID-19 news verification system

Nikhil L Kolluri

The University of Texas at Austin

Text data, social media API-feeds, News API feeds, and other API-derived data

It provides multiple outputs (including user ratings, machine learning outputs, and Botometer results) which indicate the likelihood of news being fake or fact. Users can also add their own assessments of this and these data are publicly outputted as votes are collected.

Free

Contact: nlkolluri@utexas.edu

Description: Manual fact checking is unable to cope with the large volumes of COVID-19-related fake news that now exist. To help address the need to classify this fake news proliferation in the COVID-19 ‘infodemic’, we developed CoVerifi, an automated open-source tool to verify COVID-19 news and information. CoVerifi integrates crowdsourcing, newsfeeds, social media, and machine learning. Users of the web-based tool also have the ability to “vote” on news content, making the CoVerifi platform an effective method to collect labelled data. To develop our fake news detection tool, we built a crowdsourced dataset of ~7000 entries, which we tested and validated our CoVerifier model with. Ultimately, CoVerifi empowers users to make their unused consumption decisions by providing various points of data, rather than labeling news content as fake or fact.

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

ExFacto

Anubrata Das

University of Texas at Austin

Text data in a search box.

a) a set of evidence related to the claim b) Stance of each piece of evidence c) source reputation of the presented evidence d) overall veracity of the claim

We utilize this tool is for studying human-AI interaction in the context of fact-checking, not meant for production.

Contact: anubrata@utexas.edu

Description: Although we have seen a plethora of research in automated fake news detection, in practice, a large part of fake news detection efforts relies on human labor. Lack of adoption of fake news detection algorithms can be attributed to the complex nature of the problem and the high cost of error. We propose a tool that aims to close the gap by assisting human fact-checkers in their decision-making process. This tool adopts the methodology of evidence-based explainable fact-checking. Users can type a claim into a search box. With the help of search engines such as Bing, or Google, the tool retrieves a set of evidence relevant to the claim. Further, the tool calculates the stance of each piece of evidence and the reputation of each source. It aggregates the evidence to provide a veracity outcome. Users can also override model components (stance and reputation of the evidence) if the model makes a mistake. The model takes user input into account to update the claim veracity outcome. ;

Tool	URL	Demonstrator(s)	Company or Center	Input	Output	Free or purchase
EXPO2O	TBD	Bohan Jiang, Mansooreh Karami, Anique Tahir, Huan Liu	Arizona State University DMML Lab	COVID-19 related online and offline geospatial data from different sources.	A concise, intuitive, and interactive data virtualization dashboard.	Free

Contact: bjiang14@asu.edu

Description: EXPO2O is a web-based dashboard that provides concise, intuitive, and interactive COVID-19 data visualization for users. Our dashboard allows users to visualize the potential relationship relations between various online-online, offline-offline, and online-offline data. EXPO2O also aims to improve interdisciplinary research on exploring relations between various types of data in a pandemic. In the demo, we will show preliminary findings and insights from the data we have collected so far:

Online data: Google trends data, social media data, news media data;
Offline data: COVID-19 related statistics, US census data, local events/protests/policies.

Tool	URL	Demonstrator(s)	Company or Center	Input	Output	Free or purchase
FBAdTracker		Ujun Jeong, Kaize Ding, and Huan Liu	DMML in Arizona State University	Keywords and options for searching Facebook Advertisements	Collected Facebook Advertisements and analysis on advertisements/advertisers	Free

Contact: ujeong1@asu.edu

Description: The purpose of this application is to provide an integrated data collection and analysis system for current research on fact-checking related to Facebook advertisements. Our system is capable of monitoring up-to-date Facebook ads and analyzing ads retrieved from Facebook Ads Library.

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

Hashtag & URL Coordination

Forthcoming

Isabel Murdock, Lynnette Ng, Tom Magelinski

һ�� IDeaS/CASOS

· Twitter Dataset- File (json or gziped json) or Directory (of jsons or gziped jsons)

· Type of coordination (hashtag, URL, or both)

· Time window (in minutes)

· Output filename (csv)

Edgelist (user-user-type-weight) for the coordination network that can be imported into ORA (.csv import)

Free

Contact: iem@andrew.cmu.edu

This tool constructs networks of Twitter users who are tweeting the same hashtags or same URLs within a short time window, in which tight clusters correspond to coordinated users.

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

Hate Speech Detection

Found on CASOS Servers; a public version will be available soon.

Joshua Uyheng/Netanomics

CASOS Center, Institute for Software Research, һ��

Netmapper Cues files.

CSV file with multiple levels of hate speech probabilities

Purchase with educational discount

Contact: juyheng@cs.cmu.edu

Description: The CASOS Hate Speech Detection model uses psycholinguistic features to predict the likelihood that a given tweet is hate speech. Due to multiple - and at times conflicting - definitions of hate speech, our model is trained on multiple datasets and produces likelihoods optimized for these different definitions. This affords users the ability to select the most appropriate predictions depending on their definition of choice, or run experiments with multiple definitions of hate speech for robustness. Across definitions, the model is trained using theoretically anchored features that allow for meaningful interpretations of results in relation to social identities and conflicts.

Hate Speech Detection:

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

Hoaxy

Christopher Torres, Kai-Cheng Yang,

Pik-Mai Hui, Alexander Nwala, Matthew DeVerna, John Bryden, Filippo Menczer

Indiana University Observatory on Social Media

The application uses Twitter data.

A network visualization of interactions, such as retweets, quote retweets, replies, between different accounts. The tool also allows the user to see how the network evolved over time, displays bot scores, and provides links to the tweets.

The output can be downloaded to allow future reproduction or external analysis.

Free

Contact: torresch@indiana.edu

Description: Hoaxy provides an easy-to-use way to visualize the spread of information on Twitter. The user has the ability to visualize the spread of articles which they query from a list of URLs that are associated with fact-checking sources and low-credibility domains. Additionally, we leverage the Twitter API to allow users to visualize any search query that works on the Twitter search bar. Anatomy of an online misinformation network ()

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

Image Emotion Classifier

Forthcoming

Isabel Murdock, Lynnette Ng, Tom Magelinski

һ�� IDeaS/CASOS

Image

CSV

Free

Contact: iem@andrew.cmu.edu

Description: Image Emotion Classifier is makes use of machine learning to provide the probability of images identifying with each of Plutchik eight emotional categories: anger, fear, sadness, disgust, surprise, anticipation, trust and joy. It is trained using 8000 images tagged with the respective categories on Flickr. It has been applied in a case study to identify emotions in images in an emotional event - the Kashmir Black Day event.

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

The Lumen Database

Adam Holland Shreya Tewari Chris Bavitz Peter Hankiewicz

Berkman Klein Center for Internet & Society, Harvard University

We accept copies of requests to remove content from the web, usually in the form of fielded data sent through our API. Requests to view the data can come through the API or a browser interface.

notices can be human-readable or JSON.

Free

Contact: team@lumendatabase.org

Description: Lumen is an independent research project studying cease and desist letters concerning online content. We collect and aggregate requests to remove material from the web. Initially focused on requests submitted under the United States’ Digital Millennium Copyright Act, Lumen's database, which offers API access for notice submitters and researchers, now includes complaints of all varieties, including those concerning trademark, defamation, and privacy, both domestic and international. Currently, the Lumen database contains approximately 17 million of removal requests, referencing 4.5 billion URLs, and grows by more than 20,000 notices per week, from companies such as Google, Twitter, YouTube, Wikipedia, Reddit, Medium, Github, Vimeo, and Wordpress.

Complete API documentation is available at

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

ORA

Kathleen M. Carley

Netanomics

csv or json network or attribute files

html,csv, json

There is a lite free version and a full professional version for purchase. Educational discount available

Contact: kathleen.carley@cs.cmu.edu

Description: A network analysis toolkit for graphical, statistical and visual analytics on both social networks and high dimensional networks that can vary by time and/or space. ORA is a full function network analytics package that supports the user in creating, importing, exporting, manipulating, editing, analysing, comparing, contrasting, and forecasting changes in one or more networks. ORA pro can handle networks with millions of nodes and includes BEND analytics and a stance detector.

ORA: A Toolkit for Dynamic Network Analysis and Visualization (pdf)

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

NetMapper

Kathleen M. Carley

Eric Malloy

Netanomics

json .txt .pdf .csv

csv output or xml for networks

For purchase with educational discount

Contact: kathleen.carley@cs.cmu.edu

Description: Computational linguistic tool for extracting semantic networks, meta-networks, sentiment and cues from texts and social media posts. Netmapper operated in over 40 languages. It also can extract phone numbers, emojis and emoticons. NetMapper User Guide v12 9/2021 (pdf).

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

PIEGraph

Deen Freelon,

Drew Crist, Meredith Pruden

University of North Carolina at Chapel Hill

Web domains from the user's Twitter timeline

See description

Free

Contact: freelon@email.unc.edu

Description: PIEGraph is an interactive chart that displays web domains that have recently appeared in the user's Twitter feed in a scatterplot. The x-axis represents the domains' left-right ideological orientation, while the y-axis represents content credibility. The values for both axes were generated by the Media Bias Fact Check organization (https://mediabiasfactcheck.com/). The size of each bubble represents the relative prevalence of each domain--domains appearing more often have larger bubbles.

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

Quiz Creator

Jessica Collier

Center for Media Engagement

The tool allows participants to respond to multiple choice or slider questions

The tool output includes views, quiz starts, completion percentages, and correct/incorrect responses.

Free

Contact: jessica.collier@austin.utexas.edu

Description: The Quiz Creator is a simple online tool using a step-by-step process to create a quiz in as little as 3 minutes. The interface is customizable to assist in seamless integration on any site. The quiz is embeddable to any page on a website and allows for tests of audience knowledge and response rate. Users can also create A/B tests to see which quizzes are most effective. We have done a series of projects to test the benefit of this tool: - ; ; and a working paper here:

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

smaberta

Megan Brown,

Rachel Connolly

Center for Social Media and Politics

Labelled text data

A trained transformer model

Free

Contact: meganbrown@nyu.edu

Description: smaberta is a Python client for creating transformer-based classifiers in Python. Adapted from simple transformers, smaberta allows researchers to train, evaluate, and predict using minimal code.

Tool	URL	Demonstrator(s)	Company or Center	Input	Output	Free or purchase
StoryGraph	Website: https://storygraph.cs.odu.edu/ Twitter Account:	Alexander Nwalam, Kai-Cheng Yang, Pik-Mai Hui, Christopher Torres, Matthew DeVerna, John Bryden, Filippo Menczer	Indiana University Observatory on Social Media	The application reads the RSS feeds of 17 US news organizations. No query input is required by the user.	A graph visualization where the nodes represent news articles, and an edge between a pair of nodes represents a high degree of similarity between the nodes (similar news stories).	Free

Contact: anwala@iu.edu

Description: StoryGraph quantifies the level of attention given to new stories by 17 US left, center, and right news media organizations. The service generates a news similarity graph every 10-minutes, where each news story is assigned an attention score indicating the magnitude of attention it receives collectively from the news media organizations.

365 Dots in 2019: Quantifying Attention of News Sources ()

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

TrollHunter

forthcoming

Joshua Uyheng

CASOS Center, Institute for Software Research, һ��

NetMapper Cues files.

CSV file with troll probabilities

Will be available soon. Professional version available from Netanomics in 6 months.

Contact: juyheng@cs.cmu.edu

Description: TrollHunter uses psycholinguistic features to predict the likelihood that a given account is a troll. Due to conflicting definitions of trolling, we opt for an empirically grounded operationalization that emphasizes the use of abusive, targeted language. TrollHunter relies on properties of not only the messages of the account of interest, but also any messages to which the account of interest may be interacting. This allows for context-aware predictions that align with our understanding of trolling as an interactive - and disruptive - phenomenon.

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

Twitter Simulation in Construct

http://www.casos.cs.cmu.edu/projects/construct/

Stephen Dipple

CASOS Center, Institute for Software Research, һ��

Contact: kathleen.carley@cs.cmu.edu

Description: Construct, developed by CASOS, is a multi-agent model of network evolution. In Construct individuals and groups interact communicate, learn, and make decisions in a continuous cycle. The program takes into account how agents learn through interaction conducted over different media and change their information, beliefs, and activities based on what they learn. This can be used for forecasting how a network can evolve and seeing if two groups that appear identical on one dimension actually evolve in the same way. Training and Sample Data:

Tool	URL	Demonstrator(s)	Company or Center	Input	Output	Free or purchase
Twitter V2 Conversation and Timeline Collector and v2 to v1 Tweet Converter		Isabel Murdock, Lynnette Ng, Tom Magelinski	һ�� IDeaS/CASOS	No data needed for the collection scripts. For the v2 to v1 format tweet converter, input tweets should be in the format of the direct response JSON from the v2 API.	Twitter data in the format of the Twitter API v2.	Free

Contact: iem@andrew.cmu.edu

Description: Set of python scripts for collecting Twitter user timeline data, conversations, recent search tweets, full archive search tweets, sampled stream tweets, filtered stream tweets, and user profile information using the Twitter API v2. The scripts take care of the requests and writing out the collected data. Additionally, scripts are provided that convert the collected tweets from v2 format to v1 format so that they can be compatible for existing tools/software created for the v1 format of data. The v2 to v1 converter code can also be run in a standalone fashion for data collected through other methods.

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

urlExpander

Megan Brown, Rachel Connolly

Center for Social Media and Politics

Link text (or JSON payloads from some social media sites)

JSONs of the expanded link

Free

Contact: meganbrown@nyu.edu

Description: This package makes working with link data from social media and webpages easier. It not only expands links, but catches errors, and makes parallel link expansion quick and efficient.

Tool

URL

Demonstrator(s)

Company or Center

Input

Output

Free or purchase

youtube-data-api

Megan Brown

Rachel Connolly

Center for Social Media and Politics

Query inputs

JSON outputs

Free

Contact: meganbrown@nyu.edu

Description: youtube-data-api is a Python client to download public YouTube data about channels, videos, and searches.

һ��������