The Programming Historian recently published a new lesson, Clustering and Visualising Documents using Word Embeddings. Developed by Jonathan Reades and Jennie Williams, this lesson “uses word embeddings and clustering algorithms in Python to identify groups of similar documents in a corpus of approximately 9,000 academic abstracts. It will teach you the basics of dimensionality reduction for extracting structure from a large corpus and how to evaluate your results.”
Part of a special series in partnership with Jisc and The National Archives, the lesson includes background information, a case study, and instructions in dimensional reduction, hierarchical clustering, validation, and a bibliography that includes other relevant tutorials. It’s listed as high difficulty.