I spent near on ten years thinking about automated skin cancer detection. There are various approaches you might use — cyborg human/machine hybrids were my personal favourite — but we settled on more standard machine learning approaches. Conceptually what you need is straightforward: data to learn from, and ways to lever the historical data to the future examples. The following quote is apposite.
One is that, for all the advances in machine learning, machines are still not very good at learning. Most humans need a few dozen hours to master driving. Waymo’s cars have had over 10m miles of practice, and still fall short. And once humans have learned to drive, even on the easy streets of Phoenix, they can, with a little effort, apply that knowledge anywhere, rapidly learning to adapt their skills to rush-hour Bangkok or a gravel-track in rural Greece.
You see exactly the same thing with skin cancer. With a relatively small number of examples, you can train (human) novices to be much better than most doctors. By contrast, with the machines you need literally hundreds and thousands of examples. Even when you start with large databases, as you parse the diagnostic groups, you quickly find out that for many ‘types’ you have only a few examples to learn from. The rate limiting factor becomes acquiring mega-databases cheaply. The best way to do this is to change data acquisition from a ‘research task’ to a matter of grabbing data that was collected routinely for other purposes (there is a lot of money in digital waste — ask Google).
Noam Chomsky had a few statements germane to this and much else that gets in the way of such goals (1).
Plato’s problem: How can we know so much when the evidence is do slight.
Orwell’s problem: How do we remain so ignorant when the evidence is so overwhelming.
(1): Noam Chomsky: Ideas and Ideals, Cambridge University Press, (1999). Neil Smith.