My MA practical project was the result of honing down initially very broad research into machine learning. As you can see from the diagram below – which I would never claim is even remotely exhaustive in what it purports to represent – the system is far too extensive and complex to be investigated by one student over a few months. As I whittled down my interests, I ended up writing a piece of software called snu-snu that trains Amazon's recommendation system by carrying out searches, adding items to wish-lists and scraping the resulting recommendations from Amazon to be saved for later analysis. I am calling the more recent permutation of this software RARS, which stands for Research Amazon's Recommendation System.

The areas edged green show my initial interests for this project.

In this post, I will introduce you to some of the ideas I've been working with so far, the software I developed and my plans for deploying this software as a more user-friendly web application.

The Ideas

My initial research was a little aimless. I ran some simple natural language processing on a number of books and queried and trained Amazon with the most frequently occurring words (sans articles such as 'the' and other boring words). The results were largely inexplicable and I was certain that any interpretation on my part has about as much epistemological clout as a horoscope. Particularly hard to account for were the recommendation in the table below that were garnered by keywords from none other than Adoph Hitler's Mien Kampf:

In my preliminary research, I discovered that recommendation systems rely on a number of algorithms. Some of these – such as clustering and collaborative filtering – group users according to similar behaviour. Based on this, the recommendations output by Amazon for a set of related accounts might be read as part of an algorithmically generated social profile. Bearing this in mind, the above can only be explained by Hitler's political proclivities being typical of those of parents of toddlers – or more sinisterly, of childless neo-Nazis with penchants for children's toys.

Because the conclusions I could draw were so nebulous, I have been looking for a more systematic approach. Inspired by Richard Rogers' book Digital Methods, I'm interested in an empirical methodology that works mainly with digitally native objects. One such approach could see me surveying members of the public to see which categories of consumer goods, such as handbags and boots, they associate with stereotyped groups such as the middle classes, those who are feminine, people of colour or homosexuals. I could then empirically test Amazon to see whether the logics of its recommendation system produced something analogous to our social categorisations.

I have also found material that complements this research from a less empirically grounded source: the ideas of social subjection and machine enslavement put forward by sociologist and philosopher Maurizio Lazzarato. From page 12 of his book Signs and Machines: capitalism and the production of subjectivity:

In capitalism, the production of subjectivity works in two ways through what Deleuze and Guattiari call apparatuses [dispositifs] of social subjection and machinic enslavement. Social subjection equips us with an identity, a sex, a body, a profession, a nationality, and so on. In response to the needs of the social division of labour, it in this way manufactures individuated subjects, their consciousness, representations and behaviour[…] Machinic enslavement dismantles the individuated subject, consciousness and representations, acting on both the pre-individual and supra-individual levels.

On my reading, collaborative filtering operates at the junction of the forces of social subjection and machinic enslavement for the following two reasons: first, in that individual consumers both become "dividuals", as they are reduced to aggregations of electronic data and processed in ways that cross boundaries between different subjects, between subject and object, human and non human; and second, that a categorisation simultaneously occurs that may echo that which takes place on the subjective levels of class, race, gender and so forth. It does seem methodologically unusual to mix empirical research with highly abstract sociology, but I might as well take advantage of the freedom my course affords me in this regard.

The Old Software

The snusnu python library is currently hosted on my Github at https://github.com/simoncrowe/snu-snu. I wouldn't recommend trying to use this unless you know a bit of Python and don't mind getting your hands dirty. It requires Selenium and ChromeDriver for most of its functionality, which can take a bit of work to set up.  (Note: the screenshots below are old and snusnu currently needs to be treated as a module. To use this you need to either enter 'pip install git+https://github.com/simoncrowe/snusnu.git'  on your terminal or run the python interpreter from snu-snu's parent directory and import what you need from it. e.g. 'from snusnu import terminal'. )

Using the script terminal.py you can log into Amazon and either enter as many queries as you want to have carried out or – as in the screenshot below – specify a JSON file containing a list of queries.

The JSON lists of queries are generated by another script called text_process.py. This employs frequency analysis and parts-of-speech tagging from NLTK (Natural Language Toolkit) to derive a list of words from some text. In the below screenshot, The Waves by Virginia Woolf is being processed.

The New Software

In order to better understand what I could re-use the first version of my software I drew up a flowchart to show one of my tutors, part of which is below. He quite quickly suggested that I drop the terminal interface, stick the core functionality on a server and write a web application as a front end to it.

Over the past few weeks, I've been getting my head around the Django web application framework and attempting to integrate my existing code into it. I've made a bit of progress so far, most of which I owe to the help of fellow student and software developer Fabio Natali. He has advised me on how to go about developing and deploying the application and has critiqued a number of wireframes I've sketched out for the user interface. The image below is from the latest set of wireframes and is nearing something workable.

My somewhat ambitious plan for the next month is to develop a simple web application for investigating Amazon's recommendation system and ideally deploy for free it on Amazon Web Services. I'll write further posts as the project progresses.