DITA’s visit to the British Library Labs Symposium on the 11th November kicked off a packed last few weeks of the term. What a welcoming and inspiring event; it was particularly lovely to get a shout out to the CityLIS contingent right at the start of proceedings! The keynote presentation by Armand Leroi set the tone of the day with discussion of the joining together of different disciplines in furthering knowledge (reflected in the many projects presented later where the BL collections had been put to use by a myriad of different people to artistic, historical and scientific ends).
Armand Leroi’s talk ‘The Science of Culture’ started with the statements that “all of culture is essentially becoming digitised” and, that science is concerned with “elucidating causal mechanisms and general causal theories which transcend time and place”. He advocates then for the study of culture to be more scientific, as made possible by its newly digital nature. He illustrated what he meant with many examples, one being the analysis of music into patterns of chords. The distribution of these patterns could then be measured and used to quantitatively answer a common question in cultural studies ‘has commercialisation minimised diversity in music?’ (the answer is no, btw).
In order to achieve the scientific study of culture, Leroi made clear it was necessary to have everyone on board, that no one discipline could achieve this alone. Librarians, Scholars, Scientists and Engineers would all be needed to contribute. I should mention that Leroi himself is an evolutionary developmental biologist and as he pointed out he could not have completed the work he went on to speak about, ‘The evolution of popular music: USA 1960–2010’ without his colleagues from other disciplines.
The following week I attended The National Archives Cataloguing Day, an event giving staff at the Archives an opportunity to present their work. The programme of talks varied widely but as with the presentations at the BL labs symposium, I was struck by the common thread of passion for the work that ran through each one, for many these were projects done wholly or partly on their own time and their enthusiasm was infectious. The presentation given by Mark Bell, ‘Automating Content’ where he posed the question ‘can a computer ever write catalogue descriptions?’ (Spoiler Alert: it can…but not well!) was of particular interest in the DITA context of this post.
Some of the challenges in teaching a computer to recognise the things that humans do – particularly qualitative spatial reasoning and linguistic inference for example are difficult, even using artificial intelligence.
Using entity recognition, it is still difficult to get a computer to differentiate between numbers on a page, it must be taught where to look, not just what to look for. Training systems to do this is theoretically straightforward, after all it is by experience of page numbering that humans are able to pick them out of a text. One of the problems here though comes with the way in which many available proprietary algorithms for dealing with text require a large corpus of text for their training, and these tend obviously to be the most available, like newspaper archives – which makes the AI great for interpreting and extracting meaning (and page numbers) from newspapers, not so great at doing the same for a digitised 19th century catalogue.
Extracting meaning is also not as simple as it may seem, Mark gave the example of wishing to extract the names of embassy staff from a document so they could be used in its record. The passage in which the two names appear does not refer to them directly as embassy staff, rather the account following the names: “Two men came one day, saying that they were from the Embassy…” must be understood to refer to them. This kind of linguistic inference is not straightforward and it is difficult to supply a training set which would adequately teach an AI to do this at present.
Ultimately, he concluded, computers see texts as ‘bags of words’ when it’s the structure that so often holds the meaning we wish to extract when creating a good catalogue record, so this will remain a computer-human hybrid task for the foreseeable future.
This application of AI in LIS linked in nicely with the following week’s ‘AI will replace you’ class, where lively discussion of Floridi’s paper, ‘What the Near Future of Artificial Intelligence Could Be’ kicked off the session.
The problems of closed systems, the ‘black box’ was highlighted again as being problematic in AI. We had touched on this problem the previous week in discussion of how varied results from search and text analysis algorithms could be. In the case of marketing visuals this may be no cause for alarm, but in text analysis of literature, for example, where conclusions may rely on the data spat out by the algorithm, or in the possible biased presentation of search results to a user, it certainly becomes more problematic. In the context of automated decision making, it is imperative that we open up the black box to understand how decisions affecting real-world humans are made by digital machines (I’ve written about ethics in AI in a previous post here). I’d like to return to this discussion in another post, seeing as I am trying to work on writing shorter posts and there was far too much good stuff in those two classes to cover quickly here!
Finally I would like to reflect that it remains one of my favourite aspects of this cohort that we all bring such varied views to the discussion, but everyone listens and considers other points of view (we could do with more of that in the wider world, imo). It was not surprising at this point then to hear from those at one end of the spectrum: ‘utterly terrified of what may come from AI’ to those at the other: ‘positively welcoming our robot overlords’. Full disclosure that I started off from a different place entirely that can be summed up as ‘not convinced by the whole thing’ as despite regularly thanking self-service check outs and ‘talking round’ misbehaving IT equipment, I don’t see AI as anything but the complex processing of information by machines – the lights are on, but nobody’s messaging you with them.
Our class has changed my view somewhat, I’m now ‘welcoming the robot overlords’ (I’m sure they will do a better job than our current human overlord contingent) but with the caveat that I don’t believe AI will ever achieve sentience enough to care to take over (but we should probably treat them nicely anyway, just in case…).