The FutureBase project aimed at supporting strategic forecasting using various sources of data (from small to big data) and was funded through the Hague Security Delta. In addition to SURFsara, The Hague Centre for Strategic Studies (HCSS), the University of Leiden, AGT International and the Royal Library were taking part in the project.
Different data sets for two security-related themes - russia and drones - were collected, analyzed and visualized. Within the project we worked together with the SURFsara Scalable Data Analytics group for performing text-based analysis (topic modeling using LDA, n-gramming) on roughly 10 million documents related to russia and drones. The documents consisted of tweets, articles (PDFs) and webpages. Another data source used was the Common Crawl, a freely available repository of web crawl data.
Several interactive data visualizations where created using d3.js:
- A topic coupling graph that showed topics identified using LDA and their relations (example shown above)
- Interactive querying and visualization of n-gram statistics (example shown below)
- Parallel coordinate plots of relations between topics over different subsets of data