-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consolidate_intersections makes non-integer node IDs which result in invalid osm xml #647
Comments
@parkertimmins good point. The .osm formatted XML should have integer node IDs. The question is how to best implement that, given topological node consolidation. I see two options:
The purpose of this format is to retain meaningful information about subclusters, which may be useful in subsequent analytics. For example, it indicates that nodes |
I agree with that. Since there's useful information in the IDs, and the integer requirement is specific to .osm xml, it makes sense to enforce this in save_graph_xml. What do you think about something like the following if not gdf_nodes["id"].astype(str).str.isdigit().all():
gdf_nodes["int_id"] = gdf_nodes["id"].astype(str).factorize()[0]
gdf_edges["u"] = pd.merge(gdf_edges, gdf_nodes, how="left", left_on="u", right_on="id")["int_id"]
gdf_edges["v"] = pd.merge(gdf_edges, gdf_nodes, how="left", left_on="v", right_on="id")["int_id"]
gdf_nodes["id"] = gdf_nodes["int_id"]
gdf_nodes.drop(columns=["int_id"], inplace=True) in osm_xml at: Line 199 in c2f55a3 Are there any fields that I'm missing that need to be upated? Alternatively, in an effort to diff an xml with the multipart ID and an integer ID I wrote the following. It keeps the IDs as is when possible, by replacing "45-0" with "45" and all suffixes >=1 getting new IDs. It's a dumb amount of complexity just to keep the IDs the same, so probably not a good idea. if not gdf_nodes["id"].astype(str).str.isdigit().all():
gdf_nodes[['cluster_id', 'suffix']] = gdf_nodes['id'].astype(str).str.split("-", expand=True)
gdf_nodes['suffix'] = gdf_nodes['suffix'].fillna(0).astype(int)
gdf_nodes['cluster_id'] = gdf_nodes['cluster_id'].astype(int)
gdf_nodes.sort_values(by=['suffix', 'cluster_id'], kind='mergesort', inplace=True)
gdf_nodes["int_id"] = gdf_nodes["id"].factorize(sort=False)[0]
gdf_edges["u"] = pd.merge(gdf_edges, gdf_nodes, how="left", left_on="u", right_on="id")["int_id"]
gdf_edges["v"] = pd.merge(gdf_edges, gdf_nodes, how="left", left_on="v", right_on="id")["int_id"]
gdf_nodes["id"] = gdf_nodes["int_id"]
gdf_nodes.drop(columns=["int_id"], inplace=True) |
Probably more efficient to do the node relabeling directly in NetworkX prior to converting the graph to GeoDataFrames: try:
np.array(G.nodes).astype(np.int64)
except ValueError:
G = nx.convert_node_labels_to_integers(G) Though this does make me wonder now if maybe we should just do this universally in the |
More efficient and way prettier! |
The more I think about it, the more I lean toward keeping it simple here (ie, not retaining additional attributes). This information may have been more useful to me in testing than to the public for analytics. And, anyone interested in the geometric vs topological consolidation difference can diff the results with |
That make sense that it may not be worth the additional complexity. In that case, I think just
after separating the weakly connected components will be adequate. |
Is there much difference in runtime or outcome if we do this via gdf["cluster"] = gdf["cluster"].factorize()[0] versus rebuilding the graph and then relabeling with G = nx.convert_node_labels_to_integers(G) I guess as long as they both produce the same graph in the end, we can just use whichever option is faster. |
@parkertimmins would you like to contribute this fix as a PR? |
@gboeing Yes, sorry I never got back to this. I'll test the options we discussed above and make a PR for the faster one. |
Ran consolidate_intersections on a graph with 21724 nddes and 43290 edges: Based on this, I'll make a PR with the first option |
Thanks! |
- as per gboeing/osmnx#647 saving consolidated clusters gives invlaid integers as nodes when reloading graphs so... just make sure that you either preserve what's there, or recalculate integers. Thanks gboeing. - cut nearly minute out of the loadtime for the standard graph load.
See also #1135 |
Problem description
Loaded place graph, projected graph, consolidated intersections, saved to osm xml file.
Resulting xml file should have integer node IDs, per https://wiki.openstreetmap.org/wiki/Elements
Resulting xml has some node IDs of the form: "123-0", "1-1", etc..
Environment information
Provide a complete minimal reproducible example
Possible Solution
When consolidated intersections are clustered geometrically, and it turns out there are actually multiple weakly connected component, each component is given an ID made from the original cluster ID and an integer index:
osmnx/osmnx/simplification.py
Line 533 in c2f55a3
After separating the weakly connected component, these multi-part string IDs can be replaced with new integer ID as follows
I couldn't find any existing code that depends on the format of these IDs, but I could certainly be missing it.
The text was updated successfully, but these errors were encountered: