New task was created
Allow for processed data to be re-processed quickly next time
Created on Friday 5 February 2021, 06:01
Back to task list-
ID642050
-
ProjectMetabolism of Cities Data Hub
-
StatusOpen
-
PriorityMedium
-
TypeProgramming work
-
Assigned toNo one yet
-
SubscribersJens PetersPaul Hoekman
Description
Sometimes the work of preparing data to be processed takes quite some time. Raw data may need to be manipulated quite a bit in order to get it to be in the right shape and form. After this work is done, the system can finally read and record the data. However, this process will need to be repeated next time the same dataset is being updated by the original author. It would be good to see if there are ways in which we can streamline this process.
Some possibilities include:
- Looking at common processing patterns (e.g. converting columns to rows) and trying to embed this functionality on the site, instead of having people do it in the spreadsheet.
- Somehow saving the processing as a formula/macro in the spreadsheet, so it can be re-run easily
- Looking at python scripts / Jupyter notebooks to help with the processing and thus allow for re-processing new data with the click of a button.
Let's start by collecting some examples of datasets that are time-consuming to process, so that we can make an informed decision on the best route forward. Anyone who has a good example, please post it below.