Library open only to current Claremont Colleges students, faculty, and staff: Tuesday, December 6 - Thursday, December 15. Exceptions include those visiting Bookstore, Cafe, and Special Collections Appointments. More info on Blackout Dates for Community Access.
Researchers applying text and data mining methods may have a well-established corpus of textual data they are already working with, or they may want to apply techniques to a variety of different data sources in a more exploratory way.
We've identified some data sources here that are easy to get started with, and offer a lot of potential for a variety of disciplinary research questions.
HathiTrust Research Center
The HathiTrust Research Center (HTRC) provides a platform and training resources to support computational analysis of works in the HathiTrust Digital Library (HTDL) for educational purposes.
The Claremont Colleges Library is a member of the HathiTrust collaborative, which means all Claremont-affiliated users have full access to their collections.
HathiTrust includes digitized materials from member libraries across the globe, and includes over 18 million monograph titles, as well as many types of periodicals, manuscripts, and government documents.
It's important to note that some of the materials the HTDL are still protected by copyright so there may be some limitations on their use for text and data mining applications. Please note and abide by the restrictions that you are advised of when working with HathiTrust.
There are many training resources to help you get familiar with working with the collection.