With the explosion of social media has come new ways in which children can get into unsafe situations online. How can we encourage youth to explore freely while also making sure they stay safe from cyberbullying and other online abuse?
In a recent project, Strong Analytics tackled this problem in collaboration with a fast-growing technology company. We helped them leverage state of the art machine learning and natural language processing to identify abusive content to which children were being exposed online.
Facing challenges of scale and style
Dealing with the scale and style of children's online content were both significant challenges in this project.
In terms of scale, it's no secret that younger generations have a voracious appetite for online content. The average teenager spends almost 9 hours per day online. Successfully applying machine learning to this problem meant handling millions and millions of media and natural language data per day.
Moreover, the style of children's online communication is unique and ever-evolving. Traditional language models and text processing pipelines would not suffice here, as they would risk missing abusive content in the nuance and complexity of these data.
We worked with our client to design and build a custom, end-to-end research and modeling platform that used several custom-trained, interweaved models to identify abusive text and media.
To deal with scale, we built the platform such that the pipeline could ingest tens of millions of content pieces per day through efficient preprocessing and, critically, parallelization across a scalable cluster of API servers.
To deal with the nuances of children's online communication, we delivered more than just a model — we built a research platform that enables adaptive model updating and continuous integration into production.