I was unsure about whether to post this item, it's in the TNA podcast series but deals with a topic outside the normal range of genealogical interest. I decided to include it as it interests me, and also has mention of a couple of Canadians working in the big data field related to history, if not family history.
Many of us are aware of the Old Bailey Online database, "the largest body of texts detailing the lives of non-elite people ever published, containing 197,745 criminal trials held at London's central criminal court."
What I was unaware, as explained in this talk by Professor Tim Hitchcock of the University of Hertfordshire, that the corpus has been extensively marked up with XML and what that permits by way of social analysis.
"This talk explores work to make complex trial accounts totalling 127 million words fully searchable by key word and location on The Old Bailey Online. Surveying a series of projects from geographical data and corpus linguistics to explicit semantics used to make the accounts searchable. It explored the evolution of the British criminal trial and the language used in court that without this work would have otherwise remained impenetrable."One of the Canadian resources referenced is voyant tools for text analysis, "the sort of thing you can get an undergraduate to do without any problem."