DataGorri

Figure 1: DataGorri Logo

Working with students at the Technical University of Munich, I developed a data scraper that allows researchers to automatically download tabular content and is easy to use. The software facilitates the collection of data that is freely available on the internet. It can be used free of charge, but we require that you cite the paper describing the software, which you find below.

Figure 2: The modeler is used to create a page model that is applied to a list of websites. Here, the user can select the variables she is interested in.

Figure 3: The scraper runs through a list of predefined websites that contain formally identical tables and downloads data that has been selected in the page model.

Very early versions of the software were developed as early as 2014. In the meantime, DataGorri is written in Python and its current version 1.2 was released on June 20, 2018.

DataGorri is free of charge for academic purposes only. This includes academic research of students and faculty at publicly accredited universities. The only condition for using DataGorri is that its use be mentioned in any scholarly publication or presentation and the companion paper be cited (you can find the paper here). Please see the license file for details.

The executable file of DataGorri requires a file contained in Microsoft’s Visual C++ Redistributable Package. It is not uncommon for this to already be installed on computers. However, it is not part of a standard Windows installation. You can download it directly from Microsoft. We recommend Microsoft Visual C++ Redistributable for Visual Studio 2017 Version x86 which you can download here. You can also find the package on VisualStudio.com/downloads under “Other Tools and Frameworks”. Alterantively, you can download All in One Runtimes.

You can download the current release of DataGorri here.

You can find the DataGorri’s Version History here, and the FAQ here.

Contributors

The following persons (in alphabetical order) have contributed to the current or previous versions of this software and agreed to being named as contributors:

Ivaylo Dimitrov
Matthias Franze
Julian Hackinger
Stefan Hentschel
Lukas Holzner
Florian Kreitmair
Daniel Krieger
Michael Legenc
Marc Müller