-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DyldCache: add the ability to iterate mappings and relocations #738
Conversation
…dern dyld shared cache's
src/pod.rs
Outdated
if (ptr as usize) % mem::align_of::<T>() != 0 { | ||
return Err(()); | ||
} | ||
// if (ptr as usize) % 8 != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unable to read certain structures from my local dyld shared cache because of this alignment requirement. My understanding is that the alignment should be that of the primitive types within the structure, but it appears even that isnt correct as only an alignment of 2 seemed to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which structures and fields in particular? If they are specific to dyld shared cache, then those fields should use types such as U32Bytes
instead of U32
. If it's Mach-O structures for example, then it may be that the dyld shared cache isn't aligned them the same way as normal files, and the unaligned
feature of this crate is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting it when reading the slide info:
let slide_info_file_offset = self.slide_info_file_offset.get(endian);
let version = data
.read_at::<U32<E>>(slide_info_file_offset)
.read_error("Invalid slide info file offset size or alignment")?
.get(endian);
match version {
5 => {
let slide = data
.read_at::<macho::DyldCacheSlideInfo5<E>>(slide_info_file_offset)
.read_error("Invalid dyld cache slide info size or alignment")?;
let page_starts_offset = slide_info_file_offset
.checked_add(mem::size_of::<macho::DyldCacheSlideInfo5<E>>() as u64)
.unwrap();
let page_starts = data
.read_slice_at::<U16<E>>(
page_starts_offset,
slide.page_starts_count.get(endian) as usize,
)
.read_error("Invalid page starts size or alignment")?;
Ok(Some(DyldCacheSlideInfoSlice::V5(slide, page_starts)))
}
_ => todo!("handle other dyld_cache_slide_info versions"),
}
which has this format:
pub struct DyldCacheSlideInfo5<E: Endian> {
pub version: U32<E>, // currently 5
pub page_size: U32<E>, // currently 4096 (may also be 16384)
pub page_starts_count: U32<E>,
reserved1: u32,
pub value_add: U64<E>,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the slide_info_file_offset
you're seeing only has 32-bit alignment then value_add
needs to change to U64Bytes
. And if it only has 8-bit alignment then you need to change the U32
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it was actually just the alignment of the reserved1
field that was the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like you're already using the unaligned
feature then (which only affects U32
and U64
, not u32
), so the other fields technically should be changed too so that it works without that feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I am, unless it's the default? I have just added the dependency with cargo add object
and changed the path to the local dir I have object checked out to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need to look in detail at these changes later, but there's a couple of things I would like changed which will probably affect some of the rest.
What testing have you done for this? Are you able to show me some code that demonstrates how all of this is used? |
For testing I have been using a very simple script that runs the functions I am after: use anyhow::{Context, Result};
use memmap2::Mmap;
use object::macho;
use object::read::macho::DyldCache;
use object::Endianness;
use std::fs;
use std::mem::forget;
use std::path::{Path, PathBuf};
fn map(path: PathBuf) -> Result<&'static [u8]> {
let file = fs::File::open(path).context("failed to open cache path")?;
let file = unsafe { Mmap::map(&file) }.context("failed to map cache file")?;
let data = &*file as *const [u8];
forget(file);
let data = unsafe { data.as_ref() }.context("cache map is null")?;
Ok(data)
}
fn main() -> Result<()> {
let cache_root = Path::new("/System/Volumes/Preboot/Cryptexes/OS/System/Library/dyld");
let cache = map(cache_root.join("dyld_shared_cache_arm64e"))?;
let subcaches = &[map(cache_root.join("dyld_shared_cache_arm64e.01"))?];
let cache = DyldCache::<Endianness>::parse(cache, subcaches)?;
for reloc in cache.relocations() {
if let Some(ref auth) = reloc.auth {
match (auth.key, auth.diversity, auth.addr_div) {
(macho::PtrauthKey::IA, 0u16, false) => {
dbg!(reloc);
}
_ => {}
}
}
}
for mapping in cache.mappings() {
dbg!(mapping);
}
Ok(())
} I'm not sure how to do any testing with CI as the files are prohibitively large to put in the test binaries repo. I've done a lot of manual comparison with the output of |
I'm having trouble understanding the use of |
Oh yes I can see why that's a little awkward. I should add the address to the |
I've looked a bit more and I still don't see the reason for |
They can do both, and |
Can we instead design it so that it's easy for the user to do something like |
Sorry, rust is quite new for me and I'm not sure what is wrong with that, or what the best approach is. The requirement to box the iterators came because the mappings optionally contain slide information (based on the version of the mapping info, or the mapping itself), not because the iterators are nested? |
Removing the reference from
|
It's doing a bunch of memory allocations for something that shouldn't need any memory allocations at all. So while it works, it seems to me that it could be designed better. So pub fn mappings<'cache>(
&'cache self,
) -> impl Iterator<Item = DyldCacheMapping<'data, E, R>> + 'cache {
self.mappings.iter().chain(self.subcaches.iter().flat_map(|subcache| subcache.mappings.iter()))
} I did try doing the same thing for pub fn relocations<'cache>(&'cache self) -> impl Iterator<Item = DyldRelocation> + 'cache {
self.mappings().flat_map(|mapping| mapping.relocations())
} Or we could leave out I'm sure we could replace the |
Have pushed a change so that the following now works: cache
.mappings()
.map(DyldCacheMapping::relocations)
.flatten() But I am not sure how to reimplement |
The pub trait Captures<'a> { }
impl<'a, T: ?Sized> Captures<'a> for T { }
impl<'data, E, R> DyldCache<'data, E, R>
where
E: Endian,
R: ReadRef<'data>,
{
/// Return all the relocations in this cache.
pub fn relocations<'cache>(
&'cache self,
) -> impl Iterator<Item = DyldRelocation> + Captures<'cache> + Captures<'data> {
self.mappings().flat_map(DyldCacheMapping::relocations)
}
} Defining our own iterator instead of using I'd prefer to just leave this out instead of doing workarounds. Is that okay? I don't think that writing the |
@@ -282,6 +283,33 @@ pub const VM_PROT_WRITE: u32 = 0x02; | |||
/// execute permission | |||
pub const VM_PROT_EXECUTE: u32 = 0x04; | |||
|
|||
#[repr(u8)] | |||
#[derive(Debug, Clone, Copy, PartialEq, Eq)] | |||
pub enum PtrauthKey { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't part of the file format that I can see from the usage in this PR. I think it should be moved into read::macho
, and probably given a better name. I assume this is specific to arm, so it would be good to have that in the name.
} | ||
|
||
/// Pointer auth data | ||
pub struct Ptrauth { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as for PtrauthKey
.
/// UNUSED: moved to imagesOffset to prevent older dsc_extarctors from crashing | ||
pub images_offset_old: U32<E>, | ||
/// UNUSED: moved to imagesCount to prevent older dsc_extarctors from crashing | ||
pub images_count_old: U32<E>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change, and furthermore the old names are now the names of fields in a different position, so this could silently break users. Which of the new fields that you have added do you actually need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how apple rolls. I've updated this structure to match the latest dyld source. Whenever interacting with apple structures like this I would not rely on structure locations to stay the same, but write wrappers around where they are used that handles the change in their usage in dyld. I figure that's roughly why the functions in the read module exist?
src/read/macho/dyld_cache.rs
Outdated
let file_offset: u64 = info.file_offset.get(*endian) + mapping_offset + offset; | ||
let pointer: macho::DyldCacheSlidePointer5 = data | ||
.read_at::<U64<E>>(file_offset) | ||
.unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use unwrap
. This crate must never panic for bad input data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the use of unwrap
, but noticed there is one stray in DyldCache::parse
that might have been missed previously.
let sc_header = macho::DyldCacheHeader::<E>::parse(data)?; | ||
if &sc_header.uuid != uuid { | ||
let header = macho::DyldCacheHeader::<E>::parse(data)?; | ||
if &header.uuid != uuid { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just seems neater when the variable is only used once. Can revert if you'd like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I appreciate the contribution. There's breaking changes here so I plan to do a release before merging, and I will try to do some follow up work at that time.
Add mapping and relocation iterators to the
DyldCache
class that handle modern dyld shared cache formats.