Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve char.IsWhiteSpace for non-ASCII #111569

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MihaZupan
Copy link
Member

char.IsWhiteSpace looks something like

bool IsWhiteSpace(char c)
{
    if (c < 256) return Latin1LookupTable[c]:

    return Fallback(c);
}

static bool Fallback(char ch)
{
    nuint offset = GetCategoryCasingTableOffsetNoBoundsChecks(ch);
    return (sbyte)Unsafe.AddByteOffset(ref MemoryMarshal.GetReference(CategoriesValues), offset) < 0;
}

If you just run a microbenchmark over this with non-ASCII input, things will be fine as PGO will inline the whole thing together.

But if IsWhiteSpace ends up tiering up without seeing data super skewed for non-ASCII, the GetCategoryCasingTableOffsetNoBoundsChecks call in the fallback case won't be inlined, slowing down any future queries for non-ASCII chars.

public class NonAsciiTest
{
    private readonly string _text = string.Create(50, 0, (buffer, _) => new Random(42).NextBytes(MemoryMarshal.Cast<char, byte>(buffer)));

    public NonAsciiTest()
    {
        for (int i = 0; i < 20; i++)
        {
            for (int j = 0; j < 150; j++)
            {
                _ = char.IsWhiteSpace((char)j);
            }

            Thread.Sleep(16);
        }
    }

    [Benchmark]
    public int CountWhiteSpace()
    {
        int count = 0;
        foreach (char c in _text)
        {
            if (char.IsWhiteSpace(c))
            {
                count++;
            }
        }
        return count;
    }
}
Method Toolchain Mean Error Ratio
CountWhiteSpace main 152.94 ns 3.041 ns 1.00
CountWhiteSpace pr 95.31 ns 0.439 ns 0.62

@MihaZupan MihaZupan added this to the 10.0.0 milestone Jan 18, 2025
@MihaZupan MihaZupan self-assigned this Jan 18, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant