I’ve always been interested in automatic content creation, be it text or pictures (e.g., fractal art and similar). Just very recently, though, I found myself brainstorming ever so often about what kind of videos could be automatically generated with a decent enough quality (for a very generous definition of decent). After several ideas that were mostly too complex to implement in a reasonable amount of time, I realized that there could be a way to create engaging videos automatically, based on the content of a Wikipedia page.
Wikipedia is perfect because every major topic is usually covered in great detail, and there are hundreds of such topics so that our approach can be tested in a variety of settings.
The basic idea is to create a video that will illustrate the content of the Wikipedia page: the audio track will be made of its text while the video part will be a slideshow of pictures of the topic at hand, extracted from Wikipedia and other sources. As I started experimenting and building the system, I came across several issues and possible improvements to this basic concept, so keep reading if I’ve piqued your interest!
1. Design of the system
Our only requirement is that the system will work with any Wikipedia page, with no more user input than the title of the page itself. It will output an mp4 file with the final video with no human intervention. The system will work in the following way:
- Get a Wikipedia page in input and retrieve its content;
- Parse it to define the structure of the video: we want to divide the video in multiple sections following the structure of the page;
- Retrieve any other additional information from e.g., Wikidata and pictures/videos from Pixabay;
- Build a storyboard…