Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output format code2vec #215

Open
adkonr opened this issue Jul 26, 2022 · 2 comments
Open

Output format code2vec #215

adkonr opened this issue Jul 26, 2022 · 2 comments

Comments

@adkonr
Copy link

adkonr commented Jul 26, 2022

Hello everyone,

I am trying to parse Cpp files to the Code2Vec format for further processing.
But when I am running the cli file with the following config, the output is saved as .c2s instead of the desired .c2v format.
Is this an error?
If no, how do I get the code2vec format?

Thanks for your help!

inputDir: dataset/input/
outputDir: output

parser:
name: fuzzy
languages: [cpp]

label:
name: file name

storage:
name: code2vec
maxPathLength: 1000
maxPathWidth: 1000

@zunairazaman2021
Copy link

@adkonr could you solve this issue? I am facing the same problem

@zunairazaman2021
Copy link

@vovak I am using this yaml file to extract JS code into code2vec format. But, it still gives me in code2Seq format. Can you help me here

`# input directory (path to project)
inputDir: /Users/zunaira/Desktop/JScode2vec/testerinput

output directory

outputDir: /Users/zunaira/Desktop/JScode2vec/res3

parse Java & JavaScript files with ANTLR parser

parser:
name: antlr
languages: [js]

filters:

  • name: by tree size # exclude the trees that have > 1000 nodes
    maxTreeSize: 1000
  • name: by words number
    maxTokenWordsNumber: 1000

use file names as labels

this selects the file level granularity

label:
name: file name

save to disk in the Code2Vec format

storage:
name: code2vec
maxPathLength: 8
maxPathWidth: 2
maxPaths: 1000000
maxTokens: 100000
maxPathContextsPerEntity: 5

number of threads used for parsing

the default is one thread

numOfThreads: 4
`

Screenshot 2023-02-20 at 14 34 44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants