Lincoln A. Mullen (Ph.D. Candidate, Brandeis University) has written a series of posts analyzing historical history dissertations using data from the Proquest Dissertations and Theses Database. Mullen was able to obtain two “dumps” from the database:
From the MARC records that ProQuest gave me, I’ve been able to extract a number of fields, only some of which are available for each record. There is a ProQuest assigned ID number and sometimes an ISBN. Items always have an author and a title, and almost always a page count and a year of graduation. Many of the dissertations have an abstract, which I think will be useful for mining the topics that historians have studied. There are also Library of Congress subject fields, but these are usually very generic. Almost always a university is listed, along with a “school code” which I think standardizes university names; sometimes a department is listed as well. Some items have the lead adviser and other committee members listed separately; some mash them together; still others don’t have the information at all. I’m hoping the adviser data will let me trace scholarly generations. The degree conferred is always listed. And finally there is a URL to a ProQuest record.
The latest post looks at the gender of the authors in the dataset, using a method devised to “guess” the gender based on first names by comparing the Proquest data with the Social Security Administration’s names data set. Other posts in the series cover locations, page counts, and cleaning the data, among others.