Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CONSULT-469] add initial code for field parsing #377

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

gvaradarajan
Copy link
Member

@gvaradarajan gvaradarajan commented Jan 8, 2025

First of ~4 PRs for the NMEA 2000 parsing library. Introduces a macro for defining enums representing the various lookup fields present in messages and readers that parse specific data types from a byte slice. These readers will be used by the code generated with the proc macro appearing in the next PR.

Note: Going forward, to better visualize the code generated by the macros, I recommend using cargo expand (cargo install cargo-expand to install, cargo expand to run) in the micro-rdk-nmea directory. You can use it in this PR to see how the lookup macro expands but it will be more important in the subsequent PR introducing the proc macros

@gvaradarajan gvaradarajan requested a review from a team as a code owner January 8, 2025 23:44
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
micro-rdk = { path = "../micro-rdk", features = ["native"] }
Copy link
Member Author

@gvaradarajan gvaradarajan Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my great annoyance, we've gone back to a state where micro-rdk cannot be built as a feature-less library. I'll make a ticket for this and link it here

EDIT: Ticket filed (https://viam.atlassian.net/browse/RSDK-9710)

Copy link
Member

@acmorrow acmorrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with a first pass. I haven't taken too close a look at the implementation details of the bit bashing yet, as I'd like to fully understand how the types work together and I'm not quite there yet, but here are my notes so far.

Cargo.toml Outdated
@@ -9,6 +9,7 @@ members = [
"micro-rdk-ffi",
"examples/modular-drivers",
"etc/ota-dev-server",
"micro-rdk-nmea",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we sort this list?

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
micro-rdk = { path = "../micro-rdk", features = ["native"] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can say micro-rdk = { workspace = true, features= ... here - the top level Cargo.toml imports micro-rdk.

@@ -0,0 +1,5 @@
# The micro-RDK NMEA 2000 Message Parser (WIP)

This is a library that will eventually supply logic to parse NMEA 2000 messages from byte data into
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just "This libraries supplies ..."

@@ -63,7 +63,6 @@ pub enum AppClientError {
AppConfigHeaderDateMissingError,
#[error(transparent)]
AppGrpcClientError(#[from] GrpcClientError),
#[cfg(feature = "data")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional? I don't see any other references to this type in the review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, I found this issue when I tried to build featureless micro-RDK. The issue is that config monitor now uses this error, so trying to build without the data feature fails

@@ -0,0 +1,328 @@
pub trait Lookup: Sized {
fn from_value(val: u32) -> Self;
fn to_string(&self) -> String;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • For to_string, could/should we use the ToString trait?
  • For from_value is there another trait that could be used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • yes, I meant to make that change and forgot
  • I could implement From<u32>, is that desired?

/// For generating a lookup data type found in an NMEA message. The first argument is the name of the
/// enum type that will be generated. Each successive argument is a tuple with
/// (raw number value, name of enum instance, string representation)
macro_rules! lookup {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same feeling on the name of this macro as for the trait.

type FieldType = $t;

fn read_from_data(&self, data: &[u8], start_idx: usize) -> Result<(usize, Self::FieldType), NmeaParseError> {
let type_size = std::mem::size_of::<$t>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does FieldType here work instead of $t? In the below usages? I just think it is easier if there are fewer places the macro parameters are used.

&self,
data: &[u8],
start_idx: usize,
) -> Result<(usize, Self::FieldType), NmeaParseError>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Could this take self rather than &self?
  • Could it return a new FieldReader at the advanced position rather than returning the offset?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • is there a reason we want it consumed?
  • I'm not convinced this is easy to accomplish because something that implements the FieldReader trait is unaware of the data type of the next field

}
}

macro_rules! number_field {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe generate_number_field_readers or similar.

&self,
data: &[u8],
start_idx: usize,
) -> Result<(usize, Self::FieldType), NmeaParseError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thoughts as above on signature here regarding self and the usize return.

Copy link
Member

@npmenard npmenard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with first pass;
I don't like having function signature be like Result<(usize, Self::FieldType),...> it means wee need logic to keep track where we are in the stream of bytes which i believe would be error prone.
Did you consider using a Reader at the bit level? perhaps https://docs.rs/bitreader/latest/bitreader/ can be an option

let mut end_idx = start_idx;
let field_reader: NumberField<T> = Default::default();
for i in 0..N {
let (bytes_read, next_elem) = field_reader.read_from_data(data, end_idx)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will panic if N*sizeonf(T)>len(data) i can't see where a bound check would occur

}
}

impl<T, const N: usize> FieldReader for ArrayField<T, N>
Copy link
Member

@npmenard npmenard Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having N as parameter in the template will lead to unique codegen for each pair of (T,N) I am not sure if the N is extremely important here can we get away with a reference slice? Otherwise we can leave it as is I am just concerned about overall code size

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that's true, I think it will only code gen for each actually referenced (T, N), but I can double-check

(6, Integrated, "integrated"),
(7, Surveyed, "surveyed"),
(8, Galileo, "Galileo"),
CouldNotParse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CouldNotParse feels like it should be an error vs an actual field. If we want to represent an Unknown LookupField then I would suggest naming it UnknowLookupField or alike

}
x if x < 32 => {
let shift = 32 - x;
let raw_val = u32::from_le_bytes(data_slice.try_into()?);
Copy link
Member

@npmenard npmenard Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will not work for bit_size in [16;23] as (self.bit_size / 8) == 2 and bit_size ==32 as (self.bit_size / 8) == 4. And for bit_size ==16` as (self.bit_size / 8) == 2

let raw_val = u32::from_le_bytes(data_slice.try_into()?);
raw_val >> shift
}
_ => unreachable!("lookup field raw value cannot be more than 32 bits"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not instantiate LookupField if bit_size > 32

@gvaradarajan gvaradarajan requested a review from npmenard January 15, 2025 21:00
Copy link
Member

@acmorrow acmorrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a pass through. It mostly makes sense, but I'll need to spend some more time on generate_number_field_readers to really get what is going on there. Some questions and suggestions as well.

$default
}

impl From<u32> for $name {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be TryFrom? Then you wouldn't need the $default:ident or associated enum member and could just return an error.

Copy link
Member Author

@gvaradarajan gvaradarajan Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this be the behavior because the gonmea library does not fail on an unrecognized lookup value

/// child-instances of `DataRead` or calling the `advance` function
pub struct DataCursor<'a> {
data: &'a [u8],
bit_position: Rc<AtomicUsize>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why atomics are making an appearance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a reason, but I have since forgotten it. I'll just remove it, it's not like the rest of the code is thread-safe anyway

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I've remembered again. Rust will not let me mutate the content of an Rc in the drop for DataRead because it doesn't consider it safe to mutate it when the reference count isn't 0. I have to get around this with the interior mutability of AtomicUsize

}
}

fn read(&self, bits: usize) -> Result<DataRead, NmeaParseError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you call read twice without dropping the DataRead object that resulted from the first read before the second call to read, what happens?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be very bad. They could both read from the same starting place but potentially different ending places. Even if they do read to the same end position, when they're both dropped the bit position will advance twice as far as it should. Should I protect against this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, you would leverage the type system to make such a situation impossible. Just as a for instance, you could do something similar to how ScopeGuard works, where calling read on a DataCursor consumes the DataCursor and hands you a DataRead object that holds an inner DataCursor, which you must recover via into_inner or similar, which in turn consumes the DataRead and restores the DataCursor. That way, you go back and forth between them, but you can't end up with two DataRead objects obtained on the same DataCursor object. That might also let you get rid of the Rc? I'll bet @npmenard would have some even better ideas about how to structure it. This is just me thinking aloud.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think read should consume data and update the Cursor in place if we need to read part of a byte then a peek function would be more suitable.

}
}

pub fn advance(&self, bits: usize) -> Result<(), NmeaParseError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could advance be implemented by calling read and throwing away the result?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, is that preferred?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, because then advance is very very simple, and the complexity lives only in read.

let enum_value = match self.bit_size {
x if x <= 8 => Ok::<u32, NmeaParseError>(u8::try_from(data_read)? as u32),
x if x <= 16 => Ok::<u32, NmeaParseError>(u16::try_from(data_read)? as u32),
x if x <= 32 => Ok::<u32, NmeaParseError>(u16::try_from(data_read)? as u32),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be u32::try_from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that that should be equivalent. It wasn't, which revealed a bug in number parsing. Adding another number field test and changing this to u32::try_from

@gvaradarajan gvaradarajan requested a review from acmorrow January 17, 2025 16:58
Copy link
Member

@npmenard npmenard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with first pass, i have some questions about the bit manipulations as I don't understand fully what's going on.

impl TryFrom<DataRead<'_>> for $t {
type Error = NumberFieldError;
fn try_from(value: DataRead) -> Result<Self, Self::Error> {
let max_size = std::mem::size_of::<Self>();
Copy link
Member

@npmenard npmenard Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to add a check because if max_size*8 < value.bit_size the function will panic

#[error(transparent)]
TryFromSliceError(#[from] std::array::TryFromSliceError),
#[error("end of buffer exceeded")]
EndOfBufferExceeded,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: NotEnoughData?

}
}

fn read(&self, bits: usize) -> Result<DataRead, NmeaParseError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think read should consume data and update the Cursor in place if we need to read part of a byte then a peek function would be more suitable.

// incomplete byte so it can be padded with zeros, then we reverse the bits
// of the now complete last byte for the proper bit order
let last_bit_start = bit_vec.len() - (value.bit_size % 8);
let _ = &bit_vec[last_bit_start..].reverse();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of this operation? for example if I want to read 12bit on a 16bit number then last_bit_start would be 8 so the reverse operation will affect bits [8..12]. I think this ends up doing something like shift the left most bits right overwriting on the way.
Also different value of bit_size will change the amount of bits reversed can that affect reading numbers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what I'm trying to do, yes. The reversing is just to be able to pad the bits with 0 from the front. So, if the overflow bits are 1110, I want to make it so the last byte is 00001110. To do that I reverse just the last 4 bits in place (0111), add the padding (01110000), then reverse the last byte to restore the bit order (00001110)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are suggesting I could just right shift the last byte by however many bits are remaining?

}

struct DataRead<'a> {
data: &'a [u8],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should own the buffer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants