Skip to content
This repository has been archived by the owner on Oct 30, 2023. It is now read-only.

Improve memory usage of SendWorkerOneMessageToManyRequest #72

Open
wants to merge 3 commits into
base: trunk
Choose a base branch
from

Conversation

dlogothetis
Copy link
Contributor

@dlogothetis dlogothetis commented Apr 26, 2018

https://issues.apache.org/jira/browse/GIRAPH-1190

The current implementation takes incoming messages stored as ByteArrayOneMessageToManyIds and prepares them as a map from partition id to a ByteArrayVertexIdMessages, which holds the messages for the corresponding partition. It then adds these to the message store.

However, it is possible that these intermediate lists of message get big before they are added to the message store. If they reach the capacity of the underlying buffers, the job fails. This can be avoided if we push these lists to the message store before the get big. This is mostly beneficial when we use a combiner in which case the message store keeps only one value per vertex.

Tests:

  • Unit tests
  • Compared performance on a large dataset and it is similar to the existing implementation.
  • Verified that job that would otherwise fail, it succeeds.

Maja Kabiljo and others added 2 commits April 26, 2018 08:24

@Parameterized.Parameter(value = 0)
public int numVertices;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like if you only need a single parameter, you can just use:
@parameters
public static Object[] data() {
return new Object[] { "first test", "second test" };
}

https://github.com/junit-team/junit4/wiki/parameterized-tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll simplify this then, thanks!

// Data to send
int partitionId = 13;
Partition<IntWritable, IntWritable, IntWritable> partition =
public static class NonParameterizedTest extends SharedSetup {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for not using Parameterized tests for this two test cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These other tests are a bit different with respect to what the parameter would be and the range of values. For now, I just wanted to make this particular test parameterized.

@yukselakinci
Copy link
Contributor

Not sure why the commits show Graph-1185. Do you have to rebase?

@dlogothetis
Copy link
Contributor Author

I accidentally created a commit with the same message as the previous commit. When this lands, it'll get a new commit message anyway.

Copy link
Contributor

@yukselakinci yukselakinci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs better unit tests.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants