Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix State transition err when hdfs ccm feature is used #77

Closed

Conversation

TanYuxin-tyx
Copy link

When we used dynamometer to test HDFS performance, the test encountered a error when generate DataNode Block info. The error stack is
Error: java.io.IOException: State transition not allowed; from DEFAULT to FILE_WITH_REPLICATION at com.linkedin.dynamometer.blockgenerator.XMLParser.transitionTo(XMLParser.java:107) at com.linkedin.dynamometer.blockgenerator.XMLParser.parseLine(XMLParser.java:77) at com.linkedin.dynamometer.blockgenerator.XMLParserMapper.map(XMLParserMapper.java:53) at com.linkedin.dynamometer.blockgenerator.XMLParserMapper.map(XMLParserMapper.java:26) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:151) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:828) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1690) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

After checking Fsimage xml and the source code, we find that XMLParser can not parse the lines correctly, these lines are like
<directive><id>8963</id><path>/user/somepath/path1</path><replication>3</replication><pool>cache_other_pool</pool><expiration><millis>1544454142310</millis><relatilve>false</relatilve></expiration> <directive><id>8964</id><path>/user/somepath/path2</path><replication>3</replication><pool>cache_hadoop-data_pool</pool><expiration><millis>1544497817686</millis><relatilve>false</relatilve></expiration> <directive><id>8965</id><path>/user/somepath/path3</path><replication>3</replication><pool>cache_hadoop-peisong_pool</pool><expiration><millis>1544451500312</millis><relatilve>false</relatilve></expiration> <directive><id>8967</id><path>/user/somepath/path4</path><replication>3</replication><pool>cache_other_pool</pool><expiration><millis>1544497602570</millis><relatilve>false</relatilve></expiration>

These fsimage xml lines are generated when HDFS Centralized Cache Management (CCM) feature is used.

So we add a patch to ignore the CCM fsimage xml lines when parsing xml.

@TanYuxin-tyx
Copy link
Author

@xkrogen hi Erik, could you please have a look?

@xkrogen
Copy link
Collaborator

xkrogen commented Feb 15, 2019

Hi @yuxintan , thanks for reporting this. We don't use the caching feature so we haven't encountered this issue.

I wonder if there's a better way to handle this by adding support for the CCM feature to Dynamometer? I'm not sure how much work this would be, since I don't know much about how it works.

For this patch in particular, I'm fine with excluding the directives, but not too sure about the current logic. First off, is there really no </directive> closing the XML tag? If not, this is a bug with the XML generator -- the XML you've pasted is broken. I took a look at the code, and at least on branch-2.7, I see code that should be generating the closing tag.

Second, is it guaranteed that a line containing <directive> will never also contain a real file tag, and that the directive line won't be split across multiple lines? I want to ensure that we don't accidentally skip files, and that we don't accidentally only skip part of the directive. If possible, I would rather skip the entire <CacheManagerSection>...</CacheManagerSection> to avoid any such issues and make it very clear what is being skipped.

@TanYuxin-tyx
Copy link
Author

@xkrogen Thanks very much for reviewing the patch.

CCM feature improve HDFS performance by storing some specific files' blocks in DataNode's memory, NameNode only need save these paths in its memory, there is little or no effect on NN. So we don't add support for the CCM feature to Dynamometer.

For the XML tag issue, it's the lack of clarity in my description led to confusion. There is </directive> closing the XML tag after <directive><id> line. The CCM XML lines are exactly like

<directive><id>8963</id><path>/user/somepath/path1</path><replication>3</replication><pool>cache_other_pool</pool><expiration><millis>1544454142310</millis><relatilve>false</relatilve></expiration> 
</directive>
<directive><id>8964</id><path>/user/somepath/path2</path><replication>3</replication><pool>cache_hadoop-data_pool</pool><expiration><millis>1544497817686</millis><relatilve>false</relatilve></expiration> 
</directive>
<directive><id>8965</id><path>/user/somepath/path3</path><replication>3</replication><pool>cache_hadoop-peisong_pool</pool><expiration><millis>1544451500312</millis><relatilve>false</relatilve></expiration> 
</directive>
<directive><id>8967</id><path>/user/somepath/path4</path><replication>3</replication><pool>cache_other_pool</pool><expiration><millis>1544497602570</millis><relatilve>false</relatilve></expiration>
</directive>

and the directive line won't be split across multiple lines, which are similar to other XML <inode> lines.

The patch only ignores these lines and bypass the problem, but I'm quite of your opinion that it's more elegant to avoid any such issues by skipping the entire <CacheManagerSection>...</CacheManagerSection>.

Thank you very much for your attention and kindly advice.

@xkrogen
Copy link
Collaborator

xkrogen commented Feb 19, 2019

Got it, thanks for the detailed description.

I'm re-visiting this and feel like the right way to solve this is actually to add a new State, INODE_SECTION. When the <INodeSection> tag is seen, transition into INODE_SECTION, when </INodeSection> is seen, transition out. Then, the rest of parseLine() can simply ignore text whenever the state is DEFAULT. Then any <inode> tag outside of the INode section is completely ignored.

Let me know what you think.

@xkrogen
Copy link
Collaborator

xkrogen commented Feb 27, 2019

Hey @yuxintan , I put up a PR with my described idea at PR #82, can you check and see if this solves your issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants