Skip to Main Content
The Library is now open for students, staff, and faculty of The Claremont Colleges. See COVID-19 Services and Updates for more information.

Text and Data Mining

A brief guide to tools and resources (including datasets) for getting started with computational approaches to textual analysis.

Voyant

Voyant Tools is an easy-to-use platform for analyzing digital texts. It doesn't require programming skills, and is often a good place to start if you're not sure which form of computational analysis will be best for your project.

screenshot of VoyantTools word count pane

VoyantTools can get a word frequency count in your text, as an example.

The Digital Humanities project has produced a Quick Guide to getting started.

Distant Reader

The Distant Reader is grant-funded project that provides a tool and platform for analyzing large amounts of digital text (for example, the entire text of a journal). You can provide input text via a plain text file, a URL, or a Zip file. 

Screenshot from Distant Reader text analysis platform

Example of output from a Distant Reader analysis

You will need to create an account for yourself (it is free) and having some familiarity with a language like Python for web scraping and text processing will let you get the most out of this tool.

Constellate

Constellate is a project of ITHAKA JSTOR Labs. 

This beta project provides a Jupyter Notebooks-based dashboard for performing analyses of the contents of the JSTOR platform, as well as growing numbers of other digitized content contributed by project collaborators. 

Web scraping and other programming approaches