Last Wednesday the DSSG hosted two excellent events related to machine learning and images, attracting nearly 150 virtual attendees between them.

We first heard from Ben Lee, who presented “Newspaper Navigator: Re-Imagining Digitized Newspapers with Machine Learning” as the third talk in the Spring 2021 Harvard Discovery Series presented with the Harvard Library. Ben was the 2020 Innovator in Residence at the Library of Congress and is currently a third year Ph.D. student in the Paul G. Allen School for Computer Science & Engineering at the University of Washington, where he studies human-AI interaction. Using the Chronicling America collection, a joint initiative between the Library of Congress and the NEH, he created various ways to search the collection’s images and improve its metadata using large-scale computer vision and machine learning. For instance, Newspaper Navigator, the application he developed to provide new insights into the LoC collections, uses open faceted search to empower users to train AI navigators to retrieve visually similar content. The application exposes the ML training process to the user and allows them to see new predictions in just a few seconds. His slides and a recording of his talk are both available.

Later that afternoon we had the opportunity to learn about the Distant Viewing Toolkit from Taylor Arnold (University of Richmond), Carol Chiodo (Harvard University), and Lauren Tilton (University of Richmond). In their workshop “Images as Data with Distant Viewing,” participants used Google CoLab notebooks to use tuned models and assemble end-to-end image processing pipelines to link visual annotation methods and automatically extract metadata about the images. We hope to see these exciting techniques applied to Harvard’s digital collections soon! Check out the recording here.