Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate data updates in production #521

Open
9 tasks
ivan-aksamentov opened this issue Apr 12, 2020 · 1 comment
Open
9 tasks

Automate data updates in production #521

ivan-aksamentov opened this issue Apr 12, 2020 · 1 comment
Labels
help wanted Extra attention is needed IMPORTANT Take this immediately! s:data Scope: related to data retrieval, parsing, transformation, storage, update s:infra Scope: related to infrastructure, continuous integration, deployment t:feat Type: request of a new feature, functionality, enchancement

Comments

@ivan-aksamentov
Copy link
Member

🙋 Feature Request

We want to update the data daily, however manual updates are very time-consuming and error-prone.

🔦 Context

😯 Describe the feature

We need to find a way to automate the data updates in production, while also performing some basic sanity checks. It is desirable for a human to review the update before it goes live.

This flow should be untied from the general release cycle.

Data should be updated consistently across all long-living branches (master, release, production), so that everyone is on the same page.

💻 Examples

💁 Possible Solution

  • implement single-step build for the data updates so that it can be run in CI environment

  • initiate data update build step daily, using a GitHub action (on staging branch)

  • the bot automatically creates a branch and open pull request against staging branch, containing the new data

  • maintainer reviews the PR, as well as the results of automatic checks and the deployed version of the application

  • maintainer merges the PR, possibly adding more commits into it, or closes

  • PR is created against master branch and automatically merged if possible

  • if not, maintainer resolves conflicts in the master PR and merges

  • maintainer releases the data changes by fast-forwarding release branch to staging

  • eventually, as we are confident in the reliability of checks, the merge to staging can be automatic

Related

@ivan-aksamentov ivan-aksamentov added t:feat Type: request of a new feature, functionality, enchancement help wanted Extra attention is needed s:infra Scope: related to infrastructure, continuous integration, deployment s:data Scope: related to data retrieval, parsing, transformation, storage, update IMPORTANT Take this immediately! labels Apr 12, 2020
@noleti
Copy link
Collaborator

noleti commented Apr 13, 2020

I think one step towards this would be to describe what kind of manual checks are currently done by @rneher when data is updated. Then, we could check if/how they can be automated. I assume they are checks along the lines:

  • If a new country is added:
    • is there an appropriate header
    • is it placed in correct folder
    • Is it in the right format?
    • does the data roughly make sense (monotonically rising for respective features, continuous, etc)
  • If only new data is added:
    • Is the new data a 'continuation' of old data
      • Are cases, deaths, recovered >= previous values (might flag number revisions, how to handle?)
      • Are there huge jumps in the 'current'-type values (hospitalized, ICU)
  • If old data is changed:
    • Can we see why old data was changed
      • new column added?
      • old data adjusted by few values due to correction at source?

@ivan-aksamentov ivan-aksamentov added this to the 1.2 milestone Apr 17, 2020
@ivan-aksamentov ivan-aksamentov mentioned this issue Apr 17, 2020
36 tasks
@ivan-aksamentov ivan-aksamentov removed this from the 1.2 milestone May 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed IMPORTANT Take this immediately! s:data Scope: related to data retrieval, parsing, transformation, storage, update s:infra Scope: related to infrastructure, continuous integration, deployment t:feat Type: request of a new feature, functionality, enchancement
Projects
None yet
Development

No branches or pull requests

2 participants