Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk of improvement suggestions #855

Open
mtsfer opened this issue Sep 25, 2024 · 0 comments
Open

Bulk of improvement suggestions #855

mtsfer opened this issue Sep 25, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@mtsfer
Copy link
Contributor

mtsfer commented Sep 25, 2024

#822 is pretty important and suggests great improvements to the project. However, these improvements requires a lot of collective effort and they are kinda impossible to be made considering the current structure of the repository and the existing issues in the dataset.

When I say collective effort, I'm including people that do not knows how to work with SQL. JSON is a way more readable format for lay people. If the contributions could be made in the JSON files, more people would be able to help adding new places to the dataset and fixing inconsistencies.

Here go some suggestions to the project (in order of urgency on my point of view):

  • Allow contributions on JSON (actually, should be the default way): Update records in a inlined SQL insert statement is pretty counter-productive. I would suggest move the contributions to JSON. From there, you could easily create the SQL insert statements and generate the dataset in other file formats too.
  • Reduce the size of the repository: There is almost 2Gb of data on this repo, and cloning this to contribute is a huge pain, that partially justify the low number of contributors. The majority of the data is repeated on multiple file formats, and repeated again with permutations of place types (e.g countries+cities, countries+states, countries+states+cities). I'm quite sure that individual files would be more than sufficient. Make the data available only on the most used file formats would also help;
  • Normalize the database: There are some current inconsistences in the dataset caused by inadvertent denormalization. Also, with the database normalized, it would be easier to identify problems and fix them;
  • Include translations to cities and states.
  • Introduce more specific places to the dataset, as suggested in Incoherence between the data across tables #822.

@dr5hn Thanks a lot for this project, it's a gem.

@dosubot dosubot bot added the enhancement New feature or request label Sep 25, 2024
@mtsfer mtsfer changed the title Bulk of improvement suggestions to the project Bulk of improvement suggestions Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant