os_identify: A simple image classifier for operating system screenshots
I’ve been working through Jeremy Howard’s excellent
Practical Deep Learning for Coders course, and I am pleased that the course offers ample opportunities to put its lessons into practice. Deep learning is a subfield of machine learning. Its focus is on training artificial neural networks to recognize patterns in data. For my first project, I decided to build a convolutional neural network (CNN) that could classify screenshots and try to identify the operating system depicted. CNNs extract the essential features of an image and use those features to build the weights of the model. Since identifying an operating system at a glance is something that a knowledgeable person could do well, I thought it would be a good test for an AI and a fun learning project. The fastai
library uses PyTorch under the hood, and it proves a very comfortable and user-friendly interface. With only a few lines of code, I was able to fine-tune the resnet18
architecture with four epochs and a batch size of 64. Training on a cloud GPU took only a few minutes.
I’ve embedded a Gradio app below which was built around the model. You can submit your own screenshots or click through the examples to test the model’s predictions.
This model does astoundingly well considering how little code is required. Its predictions were more than 95% accurate on its validation set (a random subset of the training data reserved for optimizing the model during training). Anecdotally, it seems to guess correctly (though not always confidently) on most screenshots that show native UI elements. It gets tripped up a bit by screenshots of large browser windows. Since Chrome and Firefox look about the same on any OS, I think that is understandable. With better training, I suspect it could use even finer categories and distinguish major OS releases. As I’m quickly learning, one of the biggest barriers to building these models is the availability of high-quality labelled data. It makes me wonder how much each of us has individually contributed to various machine learning models by solving CAPTCHAs over the years.
I’ve been fascinated by the potential of machine learning for a long time, but I stood on the sidelines thinking that we were still years away from these types of results. I didn’t expect the field to move this quickly or for the technology to be this accessible. If you can relate to that, then I hope this blog post inspires you to try some projects of your own!