A couple of articles from the two different domains of my professional life made me riff on some old memes. The first, was an article in (I think) the Times Higher about the fraud detection software Turnitin. I do not have any firsthand experience with Turnitin (‘turn-it-in’), as most of our exams use either clinical assessments or MCQs. My understanding is that submitted summative work is uploaded to Turnitin and the text compared with the corpus of text already collected. If strong similarities are present, the the work might be fraudulent. A numerical score is provided, but some interpretation is necessary, because in many domains there will be a lot of ‘stock phrases’ that are part of domain expertise, rather than evidence of cheating. How was the ‘corpus’ of text collected? Well, of course, from earlier student texts that had been uploaded.
Universities need to pay for this service, because in the age of massification, lecturers do not recognise the writing style of the students they teach. (BTW, as Graham Gibbs has pointed out, the move from formal supervised exams to course work has been a key driver of grade inflation in UK universities).
I do not know who owns the rights to the texts students submit, nor whether they are able to assert any property rights. There may be other companies out there apart from Turnitin, but you can see easily see that the more data they collect, the more powerful their software becomes. If the substrate is free, then the costs relate to how powerful their algorithms are. It is easy to imagine how this becomes a monopoly. However, if copies of all the submitted texts are kept by universities then collectively it would make it easier for a challenger to enter the field. But network effects will still operate.
The other example comes from medicine rather than education. The FT ran a story about the use of ‘machine learning’ to diagnose retinal scans. Many groups are working on this, but this report was about Moorfields in London. I think I read that as the work was being commercialised, then the hospital would have access to the commercial software free of charge. There are several issues, here.
Although, I have no expert knowledge in this particular domain, I know a little about skin cancer diagnosis using automated methods. First, the clinical material and annotation of clinical material is absolutely rate limiting. Second, once the system is commercialised, the more any subsequent images can be uploaded the better you would imagine the system will become. This of course requires further image annotation, but if we are interesting in improving diagnosis, we should keep enlarging the database if the costs of annotation are acceptable. As in the Turnitin example, the danger is that the monopoly provider becomes ever more powerful. Again, if the image use remains non-exclusive, then it means there are lower barriers to entry.