Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incoherence between the data across tables #822

Open
dugzino opened this issue Aug 12, 2024 · 1 comment
Open

Incoherence between the data across tables #822

dugzino opened this issue Aug 12, 2024 · 1 comment
Labels
bug Something isn't working enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files.

Comments

@dugzino
Copy link

dugzino commented Aug 12, 2024

Issue

There's a big issue with the data.
The table cities contains city and town data.
The table states contains state and city data.
And the link between them are wrong.

Found issues in Belgium, Albania, & Japan. So, there probably are in other countries.

Examples

Example 1

For instance in Belgium:

  • In states you can find "Antwerp". Which is incorrect as "Antwerp" is a city.
  • In cities you can find "Antwerp" and linked to it's parent "Flanders". Which is correct.
  • In cities you can find "Borgerhout" which is linked to the state (region) "Flanders". Which is incorrect. "Borgerhout" is a town in the city of "Antwerp".

So, the data for Belgium is completely messed up. Pretty much 90% of it and even fixing the bad IDs won't fix the other issue which is lack of tables and fed data.

Example 2

This time in Albania.
We have duplicates in states like "Tirana County" and "Tirana District".
"Tirana" (or "Tirana district", but it's really just "Tirana") is a city in the state of "Tirana County".
Even though we never make use of the state "Tirana District" it's still there.
We then have "Kavajë District" in states when it's actually the city "Kavajë" and should be in cities.
Also "Bashkia Kavajë" is not even a town. It's even lower level. It's a place in a town.

Solution

The solution would be a whole refactor of the tables and data. Which is huge, but necessary.
It should be like:

continents(
  name: string,
);

regions(
  name: string,
  continent_id: Continent,
);

countries(
  name: string,
  region_id: Region,
);

states(
  name: string,
  country_id: Country,
);

cities(
  name: string,
  country_id: Country,
  state_id: State | null,
);

towns(
  name: string,
  city_id: City,
);

places(
  name: string,
  town_id: Town,
);

Doing this would totally make things a lot more simpler and cleaner to add, edit, and more importantly make use of it.
I wouldn't mind doing a POC, but I got my hands full right now. This should become the next version of this repo.

Copy link

dosubot bot commented Aug 12, 2024

Hey @dugzino! I'm here to help you with any bugs, questions, or contributions you have for the repository. Let me know how I can assist you!

The issue you've identified highlights significant inconsistencies in the data structure of the countries-states-cities-database. Specifically, the misclassification of cities and towns in the states and cities tables, and the incorrect links between them.

Current Issues:

  1. Misclassification:

    • Belgium: "Antwerp" is incorrectly listed as a state instead of a city. "Borgerhout" is a town in the city of "Antwerp" but is incorrectly linked to the state "Flanders".
    • Albania: Duplicates like "Tirana County" and "Tirana District". "Kavajë District" should be a city, not a state. "Bashkia Kavajë" is a place within a town, not a town itself.
  2. Incorrect Links:

    • Cities and towns are linked to incorrect parent states or regions.

Proposed Solution:

A comprehensive refactor of the database schema to introduce more granular tables and correct the hierarchical relationships. The proposed schema is:

continents(
  name: string,
);

regions(
  name: string,
  continent_id: Continent,
);

countries(
  name: string,
  region_id: Region,
);

states(
  name: string,
  country_id: Country,
);

cities(
  name: string,
  country_id: Country,
  state_id: State | null,
);

towns(
  name: string,
  city_id: City,
);

places(
  name: string,
  town_id: Town,
);

This schema would ensure a clear and accurate representation of geographical entities and their relationships, making the data easier to manage and use.

Next Steps:

  • Data Cleanup: Correct the existing data to match the new schema.
  • Schema Refactor: Implement the new schema in the database.
  • Data Migration: Migrate the existing data to the new schema, ensuring all links and classifications are correct.

This refactor would address the current issues and provide a more robust structure for future data additions and modifications. If you need further assistance or a detailed plan for the refactor, feel free to ask!

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added bug Something isn't working enhancement New feature or request labels Aug 12, 2024
@dr5hn dr5hn added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

No branches or pull requests

2 participants