Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trouble getting started #2

Open
cnjr2 opened this issue Sep 9, 2015 · 14 comments
Open

trouble getting started #2

cnjr2 opened this issue Sep 9, 2015 · 14 comments

Comments

@cnjr2
Copy link

cnjr2 commented Sep 9, 2015

Thank you for the development of ProFET!

I wanted to try it out but I ran into some trouble. It would be great if you could point me towards where I am going wrong.

I am using python 3.4 and I have have installed all the dependencies mentioned in the README.md. I have the following folder structure where feat_extract is my working directory:

feat_extract/
|_pipeline.py
|_other ProFET files...
|_test_seq/...
|_train/
| |_A/
| | |_train_sequences_A.fasta
| |_B/
|   |_train_sequences_B.fasta
|_test
  |_A/
  | |_test_sequences_A.fasta
  |_B/
    |_test_sequences_B.fasta

The fasta files were created with the following set of commands:

    cd ./test_seq/Extracellular/
    tail -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../train/A/train_sequences_A.fasta
    tail -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../train/B/train_sequences_B.fasta
    head -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../test/A/test_sequences_A.fasta
    head -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../test/B/test_sequences_B.fasta
    cd ../../

When running the command:

python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType dir

I get the following error message:

<cProfile.Profile object at 0x107745db0>
Starting to extract features from training set
dirr change to: ./train
Multiclass fasta_files list found: []
Features generated
Removing any all zero features
df.shape:  (0, 0)
df_cleaned shape:  (0, 0)
Done
Extracted training data features
Training predictive model
Traceback (most recent call last):
  File "pipeline.py", line 171, in <module>
    res = profiler.runcall(pipeline)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall
    return func(*args, **kw)
  File "pipeline.py", line 90, in pipeline
    model, lb_encoder = trainClassifier(filename=trainingDir+'/trainingSetFeatures.csv',normFlag= False,classifierType= classifierType,kbest= 0,alpha= False,optimalFlag= False) #Win
  File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 114, in trainClassifier
    features, labels, lb_encoder,featureNames = load_data(filename, 'file')
  File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 36, in load_data
    df = pd.read_csv(dataFrame, index_col=[0,1]) # is index column 0 in multiindex as well?
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 474, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 250, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 566, in __init__
    self._make_engine(self.engine)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 705, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 1072, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 350, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:3173)
  File "pandas/parser.pyx", line 594, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5912)
OSError: File b'./train/trainingSetFeatures.csv' does not exist

It complains that ./train/trainingSetFeatures.csv' does not exist. I see that a file with this name is being created in the train folder, however it is a table with only column names (no rows).

Thank you for your help.

@ddofer
Copy link
Owner

ddofer commented Sep 9, 2015

The problem is that no features are extracted. (Not sure why).
Have you tried extracting features using the "file" vs "dir" option?
I'll be uploading an update In the next few days.
On Sep 9, 2015 4:08 PM, "cnjr2" [email protected] wrote:

Thank you for the development of ProFET!

I wanted to try it out but I ran into some trouble. It would be great if
you could point me towards where I am going wrong.

I am using python 3.4 and I have have installed all the dependencies
mentioned in the README.md. I have the following folder structure where
feat_extract is my working directory:

feat_extract/
|_pipeline.py
|_other ProFET files...
|_test_seq/...
|_train/
| |_A/
| | |_train_sequences_A.fasta
| |_B/
| |_train_sequences_B.fasta
|_test
|_A/
| |_test_sequences_A.fasta
|_B/
|_test_sequences_B.fasta

The fasta files were created with the following set of commands:

cd ./test_seq/Extracellular/
tail -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../train/A/train_sequences_A.fasta
tail -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../train/B/train_sequences_B.fasta
head -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../test/A/test_sequences_A.fasta
head -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../test/B/test_sequences_B.fasta
cd ../../

When running the command:

python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType dir

I get the following error message:

<cProfile.Profile object at 0x107745db0>
Starting to extract features from training set
dirr change to: ./train
Multiclass fasta_files list found: []
Features generated
Removing any all zero features
df.shape: (0, 0)
df_cleaned shape: (0, 0)
Done
Extracted training data features
Training predictive model
Traceback (most recent call last):
File "pipeline.py", line 171, in
res = profiler.runcall(pipeline)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall
return func(_args, *_kw)
File "pipeline.py", line 90, in pipeline
model, lb_encoder = trainClassifier(filename=trainingDir+'/trainingSetFeatures.csv',normFlag= False,classifierType= classifierType,kbest= 0,alpha= False,optimalFlag= False) #Win
File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 114, in trainClassifier
features, labels, lb_encoder,featureNames = load_data(filename, 'file')
File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 36, in load_data
df = pd.read_csv(dataFrame, index_col=[0,1]) # is index column 0 in multiindex as well?
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 474, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 250, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 566, in init
self._make_engine(self.engine)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 705, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 1072, in init
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 350, in pandas.parser.TextReader.cinit (pandas/parser.c:3173)
File "pandas/parser.pyx", line 594, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5912)
OSError: File b'./train/trainingSetFeatures.csv' does not exist

It complains that ./train/trainingSetFeatures.csv' does not exist. I see
that a file with this name is being created in the train folder, however
it is a table with only column names (no rows).

Thank you for your help.


Reply to this email directly or view it on GitHub
#2.

@cnjr2
Copy link
Author

cnjr2 commented Sep 9, 2015

I have tried the following:

python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType file

with the same result ./train/trainingSetFeatures.csv' does not exist.

Thanks for the update, I am looking forward to it.

@ddofer
Copy link
Owner

ddofer commented Sep 14, 2015

What OS are you using? Try using the absolute file path.
The update has been implemented.

@ddofer
Copy link
Owner

ddofer commented Sep 15, 2015

Also - the Tail command outputs lines ; It could have messed up the fasta
formated files.
https://en.wikipedia.org/wiki/Tail_(Unix)

On Wed, Sep 9, 2015 at 4:08 PM, cnjr2 [email protected] wrote:

Thank you for the development of ProFET!

I wanted to try it out but I ran into some trouble. It would be great if
you could point me towards where I am going wrong.

I am using python 3.4 and I have have installed all the dependencies
mentioned in the README.md. I have the following folder structure where
feat_extract is my working directory:

feat_extract/
|_pipeline.py
|_other ProFET files...
|_test_seq/...
|_train/
| |_A/
| | |_train_sequences_A.fasta
| |_B/
| |_train_sequences_B.fasta
|_test
|_A/
| |_test_sequences_A.fasta
|_B/
|_test_sequences_B.fasta

The fasta files were created with the following set of commands:

cd ./test_seq/Extracellular/
tail -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../train/A/train_sequences_A.fasta
tail -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../train/B/train_sequences_B.fasta
head -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../test/A/test_sequences_A.fasta
head -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../test/B/test_sequences_B.fasta
cd ../../

When running the command:

python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType dir

I get the following error message:

<cProfile.Profile object at 0x107745db0>
Starting to extract features from training set
dirr change to: ./train
Multiclass fasta_files list found: []
Features generated
Removing any all zero features
df.shape: (0, 0)
df_cleaned shape: (0, 0)
Done
Extracted training data features
Training predictive model
Traceback (most recent call last):
File "pipeline.py", line 171, in
res = profiler.runcall(pipeline)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall
return func(_args, *_kw)
File "pipeline.py", line 90, in pipeline
model, lb_encoder = trainClassifier(filename=trainingDir+'/trainingSetFeatures.csv',normFlag= False,classifierType= classifierType,kbest= 0,alpha= False,optimalFlag= False) #Win
File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 114, in trainClassifier
features, labels, lb_encoder,featureNames = load_data(filename, 'file')
File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 36, in load_data
df = pd.read_csv(dataFrame, index_col=[0,1]) # is index column 0 in multiindex as well?
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 474, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 250, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 566, in init
self._make_engine(self.engine)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 705, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 1072, in init
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 350, in pandas.parser.TextReader.cinit (pandas/parser.c:3173)
File "pandas/parser.pyx", line 594, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5912)
OSError: File b'./train/trainingSetFeatures.csv' does not exist

It complains that ./train/trainingSetFeatures.csv' does not exist. I see
that a file with this name is being created in the train folder, however
it is a table with only column names (no rows).

Thank you for your help.


Reply to this email directly or view it on GitHub
#2.

Dan Ofer - דן עופר
Publications http://scholar.google.co.il/citations?hl=en&user=uDx2ItYAAAAJ

Photography
http://picasaweb.google.com/ddofer
http://500px.com/DanOfer

@cnjr2
Copy link
Author

cnjr2 commented Sep 20, 2015

Thanks Dan for your reply and the recent update.

I have now fixed the .fasta files and I have rerun ProFET with the same instructions as before:

python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType dir

which gives this now:

<cProfile.Profile object at 0x1070c3590>
Starting to extract features from training set
Multiclass fasta_files list found: ['./train/B/train_sequences_B.fasta', './train/A/train_sequences_A.fasta']
Getting features from a single fasta file- dict_keys(['./train/B/train_sequences_B.fasta'])
Getting features from a single fasta file- dict_keys(['./train/A/train_sequences_A.fasta'])
Features generated
Removing any all zero features
df.shape:  (254, 1170)
df_cleaned shape:  (254, 1170)
Done
Extracted training data features
Training predictive model
Features files does not contains labels
Traceback (most recent call last):
  File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 49, in load_data
    df.set_index(keys = ['accession', 'classname'], inplace=True)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 2607, in set_index
    level = frame[col].values
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1797, in __getitem__
    return self._getitem_column(key)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1804, in _getitem_column
    return self._get_item_cache(key)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/generic.py", line 1084, in _get_item_cache
    values = self._data.get(item)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 2851, in get
    loc = self.items.get_loc(item)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/index.py", line 1572, in get_loc
    return self._engine.get_loc(_values_from_object(key))
  File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)
  File "pandas/hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)
  File "pandas/hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)
KeyError: 'accession'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pipeline.py", line 171, in <module>
    res = profiler.runcall(pipeline)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall
    return func(*args, **kw)
  File "pipeline.py", line 90, in pipeline
    model, lb_encoder = trainClassifier(filename=trainingDir+'/trainingSetFeatures.csv',normFlag= False,classifierType= classifierType,kbest= 0,alpha= False,optimalFlag= False) #Win
  File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 133, in trainClassifier
    features, labels, label_encoder, featureNames = load_data(filename)
  File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 56, in load_data
    df.set_index(keys = 'accession', inplace=True)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 2607, in set_index
    level = frame[col].values
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1797, in __getitem__
    return self._getitem_column(key)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1804, in _getitem_column
    return self._get_item_cache(key)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/generic.py", line 1084, in _get_item_cache
    values = self._data.get(item)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 2851, in get
    loc = self.items.get_loc(item)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/index.py", line 1572, in get_loc
    return self._engine.get_loc(_values_from_object(key))
  File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)
  File "pandas/hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)
  File "pandas/hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)
KeyError: 'accession'

Hence it does now seem to produce the features (i.e. the files trainingSetFeatures.csv and trainingSetNormParams.csv are now generated with some contents). This time the file is generated in the working directory... And now it complains that Features files does not contains labels. Where am I going wrong?

p.s.: I am on a Mac.
p.p.s: I have also run the command with full paths.

@ddofer
Copy link
Owner

ddofer commented Sep 20, 2015

There might be an issue with the "dir" option. (We didn't use it while
writing the articles).
The program is complaining that it's not getting labels/class for the
sequences. Possibly the recent update messed up how the "-dir" option gives
labels according to directories, but I'm only guessing.

Try running a "test case", with one of the other "labeling" options, I.e
"-file" . Tell me if it still isn't working then - that will narrow it down.
On Sep 20, 2015 1:57 PM, "cnjr2" [email protected] wrote:

Thanks Dan for your reply and the recent update.

I have now fixed the .fasta files and I have rerun ProFET with the same
instructions as before:

python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType dir

which gives this now:

<cProfile.Profile object at 0x1070c3590>
Starting to extract features from training set
Multiclass fasta_files list found: ['./train/B/train_sequences_B.fasta', './train/A/train_sequences_A.fasta']
Getting features from a single fasta file- dict_keys(['./train/B/train_sequences_B.fasta'])
Getting features from a single fasta file- dict_keys(['./train/A/train_sequences_A.fasta'])
Features generated
Removing any all zero features
df.shape: (254, 1170)
df_cleaned shape: (254, 1170)
Done
Extracted training data features
Training predictive model
Features files does not contains labels
Traceback (most recent call last):
File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 49, in load_data
df.set_index(keys = ['accession', 'classname'], inplace=True)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 2607, in set_index
level = frame[col].values
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1797, in getitem
return self._getitem_column(key)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1804, in _getitem_column
return self._get_item_cache(key)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/generic.py", line 1084, in _get_item_cache
values = self._data.get(item)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 2851, in get
loc = self.items.get_loc(item)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/index.py", line 1572, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)
File "pandas/hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)
File "pandas/hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)
KeyError: 'accession'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "pipeline.py", line 171, in
res = profiler.runcall(pipeline)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall
return func(_args, *_kw)
File "pipeline.py", line 90, in pipeline
model, lb_encoder = trainClassifier(filename=trainingDir+'/trainingSetFeatures.csv',normFlag= False,classifierType= classifierType,kbest= 0,alpha= False,optimalFlag= False) #Win
File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 133, in trainClassifier
features, labels, label_encoder, featureNames = load_data(filename)
File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 56, in load_data
df.set_index(keys = 'accession', inplace=True)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 2607, in set_index
level = frame[col].values
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1797, in getitem
return self._getitem_column(key)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1804, in _getitem_column
return self._get_item_cache(key)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/generic.py", line 1084, in _get_item_cache
values = self._data.get(item)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 2851, in get
loc = self.items.get_loc(item)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/index.py", line 1572, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)
File "pandas/hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)
File "pandas/hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)
KeyError: 'accession'

Hence it does now seem to produce the features (i.e. the files
trainingSetFeatures.csv and trainingSetNormParams.csv are now generated
with some contents).

However it does now complain that Features files does not contains labels.
Where am I going wrong?

p.s.: I am on a Mac.


Reply to this email directly or view it on GitHub
#2 (comment).

@cnjr2
Copy link
Author

cnjr2 commented Sep 20, 2015

I have now tried changing to the --classType file whilst keeping my folder structure the same:

python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType file

that gives:

<cProfile.Profile object at 0x106f4ed48>
Starting to extract features from training set
Traceback (most recent call last):
  File "pipeline.py", line 171, in <module>
    res = profiler.runcall(pipeline)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall
    return func(*args, **kw)
  File "pipeline.py", line 83, in pipeline
    classType=classType, normParams='.')
  File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 602, in featExt
    multiClass=True, Dirr = directory)
  File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 444, in get_features
    features = get_MultiClass_features(trainingSetFlag, classType)
  File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 397, in get_MultiClass_features
    fasta_files_dict = Get_Dirr_All_Fasta (classType,Dirr)
  File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 308, in Get_Dirr_All_Fasta
    files_dict[os.path.join(root, name)] = className
UnboundLocalError: local variable 'className' referenced before assignment

@ddofer
Copy link
Owner

ddofer commented Sep 20, 2015

With --classType file - your "classes" should be in 2 seperate multifasta
files (each containing all the sequences belonging to a class. [without
"overlapping"/duplicates].

e.g. (In case you have 2 classes)
in "Dir: Train"
Train/Secreted.fasta
Train/NegSecreted.fasta

And use this dir as the trainingSetDir

On Sun, Sep 20, 2015 at 2:17 PM, cnjr2 [email protected] wrote:

I have now tried changing to the --classType file whilst keeping my
folder structure the same:

python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType file

that gives:

<cProfile.Profile object at 0x106f4ed48>
Starting to extract features from training set
Traceback (most recent call last):
File "pipeline.py", line 171, in
res = profiler.runcall(pipeline)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall
return func(_args, *_kw)
File "pipeline.py", line 83, in pipeline
classType=classType, normParams='.')
File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 602, in featExt
multiClass=True, Dirr = directory)
File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 444, in get_features
features = get_MultiClass_features(trainingSetFlag, classType)
File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 397, in get_MultiClass_features
fasta_files_dict = Get_Dirr_All_Fasta (classType,Dirr)
File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 308, in Get_Dirr_All_Fasta
files_dict[os.path.join(root, name)] = className
UnboundLocalError: local variable 'className' referenced before assignment


Reply to this email directly or view it on GitHub
#2 (comment).

Dan Ofer - דן עופר
Publications http://scholar.google.co.il/citations?hl=en&user=uDx2ItYAAAAJ

Photography
http://picasaweb.google.com/ddofer
http://500px.com/DanOfer

@cnjr2
Copy link
Author

cnjr2 commented Sep 20, 2015

I now tried:

python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType file

with feat_extract being the working directory and the following folder structure:

screenshot_20_09_2015_15_41

I still get:

<cProfile.Profile object at 0x108148d48>
Starting to extract features from training set
Traceback (most recent call last):
  File "pipeline.py", line 171, in <module>
    res = profiler.runcall(pipeline)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall
    return func(*args, **kw)
  File "pipeline.py", line 83, in pipeline
    classType=classType, normParams='.')
  File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 602, in featExt
    multiClass=True, Dirr = directory)
  File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 444, in get_features
    features = get_MultiClass_features(trainingSetFlag, classType)
  File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 397, in get_MultiClass_features
    fasta_files_dict = Get_Dirr_All_Fasta (classType,Dirr)
  File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 308, in Get_Dirr_All_Fasta
    files_dict[os.path.join(root, name)] = className
UnboundLocalError: local variable 'className' referenced before assignment

@cnjr2
Copy link
Author

cnjr2 commented Oct 1, 2015

Dear Dan, are there any updates? Thanks for your help!

@ddofer
Copy link
Owner

ddofer commented Oct 1, 2015

Hi,
I'm afraid that I'll be unable to debug the the issue, as I'll be
unavailable for the next month.
I suggest forking from the earliest commit in the meantime. I really
apologize!
Good luck.
On Oct 1, 2015 5:47 PM, "cnjr2" [email protected] wrote:

Dear Dan, are there any updates? Thanks for your help!


Reply to this email directly or view it on GitHub
#2 (comment).

@cnjr2
Copy link
Author

cnjr2 commented Oct 1, 2015

Thanks for the info. I will give it a shot!

@ddofer
Copy link
Owner

ddofer commented Oct 1, 2015

Worst case, just use the features generation methods/featureGen.py

Good luck
On Oct 1, 2015 7:27 PM, "cnjr2" [email protected] wrote:

Thanks for the info. I will give it a shot!


Reply to this email directly or view it on GitHub
#2 (comment).

@ChalaTuro
Copy link

Hi,
I am also getting exactly same error message i.e "Features files does not contains labels" and associated errors as indicated above.
Does anyone get solution?
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants