Skip to Main Content

Text & Data Mining

A brief guide to tools and resources (including datasets) for getting started with computational approaches to textual analysis.

Voyant

Voyant Tools is an easy-to-use platform for analyzing digital texts. It doesn't require programming skills, and is often a good place to start if you're not sure which form of computational analysis will be best for your project.

screenshot of VoyantTools word count pane

VoyantTools can get a word frequency count in your text, as an example.

The Digital Humanities project has produced a Quick Guide to Voyant Tools.

Distant Reader

The Distant Reader is grant-funded project that provides a tool and platform for analyzing large amounts of digital text (for example, the entire text of a journal). You can provide input text via a plain text file, a URL, or a Zip file. 

Screenshot from Distant Reader text analysis platform

Example of output from a Distant Reader analysis

You will need to create an account for yourself (it is free) and having some familiarity with a language like Python for web scraping and text processing will let you get the most out of this tool.

Web scraping and other programming approaches