Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-Place Dense Matrix Transposition #2199

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jessicapriebe
Copy link
Contributor

@jessicapriebe jessicapriebe commented Jan 30, 2025

Added a new kernel for In-Place Dense Matrix Transposition, based on Algorithm 467 by Brenner (DOI: 10.1145/355611.362542).

Performance:
Compared to the existing kernel, the added method provides significant performance benefits in a single-threaded context:

  • Faster in 82.7% of cases.
  • Average speedup of 25.8%.

Note:
Performance measurements were restricted to cases where the existing kernel yields correct results. Similar or even better performance could be observed across all cases.

Future Work:
The divisors operate on different indices of the array, allowing for parallelization and offering additional performance improvements in multi-threaded scenarios.

@mboehm7

Copy link
Contributor

@Baunsgaard Baunsgaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool, thanks for the PR, and finding bugs in my transpose in place.

I have only a few comments.

matrix[comp] = cval;
break;
}
if(prevComp == start) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else if?

int prevOrig = prevIndexCycle(orig, rows, (maxIndex + 1) / rows);
int prevComp = maxIndex - prevOrig;

while(true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can probably help the compiler and improve performance a bit, by moving the content of this loop to another function.

furthermore, is this loop possible to do in parallel?


LibMatrixReorg.transposeInPlaceDenseBrenner(X, 1);

TestUtils.compareMatrices(X, tX, 1e-8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set epsilon to 0.

Copy link

codecov bot commented Jan 30, 2025

Codecov Report

Attention: Patch coverage is 94.87179% with 6 lines in your changes missing coverage. Please review.

Project coverage is 72.30%. Comparing base (f8522a7) to head (c398bfa).
Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
...ache/sysds/runtime/matrix/data/LibMatrixReorg.java 94.87% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2199      +/-   ##
============================================
+ Coverage     71.88%   72.30%   +0.42%     
- Complexity    44701    45023     +322     
============================================
  Files          1449     1452       +3     
  Lines        169182   169417     +235     
  Branches      32980    33059      +79     
============================================
+ Hits         121617   122498     +881     
+ Misses        38237    37602     -635     
+ Partials       9328     9317      -11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants