Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: TypeError: unsupported operand type(s) for +: 'Tensor' and 'str' #1

Open
omgwenxx opened this issue Jan 14, 2025 · 3 comments
Open

Comments

@omgwenxx
Copy link

omgwenxx commented Jan 14, 2025

Dear all,

first of all, thank you for creating the framework.

I am trying to create feature_embeddings for items descriptions (see an example of my file below):

article_id	product_code	prod_name	product_type_no	product_type_name	product_group_name	graphical_appearance_no	graphical_appearance_name	colour_group_code	colour_group_name	perceived_colour_value_id	perceived_colour_value_name	perceived_colour_master_id	perceived_colour_master_name	department_no	department_name	index_code	index_name	index_group_no	index_group_name	section_no	section_name	garment_group_no	garment_group_name	detail_desc
0108775015	0108775	Strap top	253	Vest top	Garment Upper body	1010016	Solid	09	Black	4	Dark	5	Black	1676	Jersey Basic	A	Ladieswear	1	Ladieswear	16	Womens Everyday Basics	1002	Jersey Basic	Jersey top with narrow shoulder straps.
0108775044	0108775	Strap top	253	Vest top	Garment Upper body	1010016	Solid	10	White	3	Light	9	White	1676	Jersey Basic	A	Ladieswear	1	Ladieswear	16	Womens Everyday Basics	1002	Jersey Basic	Jersey top with narrow shoulder straps.
0108775051	0108775	Strap top (1)	253	Vest top	Garment Upper body	1010017	Stripe	11	Off White	1	Dusty Light	9	White	1676	Jersey Basic	A	Ladieswear	1	Ladieswear	16	Womens Everyday Basics	1002	Jersey Basic	Jersey top with narrow shoulder straps.

This is my ducho_config.yaml

dataset_path: ./data_raw/h-and-m-personalized-fashion-recommendations
gpu list: 0

textual:
    items:
        input_path: articles.tsv
        item_column: article_id
        text_column: detail_desc
        output_path: textual_embeddings_32
        model: [
          { model_name: sentence-transformers/all-mpnet-base-v2,
              output_layers: 1,
              clear_text: False,
              backend: sentence_transformers,
              batch_size: 32
          }
        ]

And this is the error message I get:

TypeError: unsupported operand type(s) for +: 'Tensor' and 'str'

Traceback (most recent call last):
  File "your_script.py", line 59, in <module>
    main()
  File "your_script.py", line 54, in main
    extractor_obj.execute_extractions()
  File "Runner.py", line 222, in execute_extractions
    self.do_extraction(modality, source)
  File "Runner.py", line 272, in do_extraction
    _execute_extraction_from_models_list(
        models=models,
        extractor_class=extractor,
        gpu=self._config.get_gpu(),
        dataset=dataset,
    )
  File "Runner.py", line 113, in _execute_extraction_from_models_list
    dataset.create_output_file(
        batch, 
        extractor_output, 
        model_layer,
        fusion=model.get('fusion')
    )
  File "TextualDataset.py", line 183, in create_output_file
    output_file_name = [f + '.npy' for f in filenames]
  File "TextualDataset.py", line 183, in <listcomp>
    output_file_name = [f + '.npy' for f in filenames]

BR,
Gwen

@omgwenxx
Copy link
Author

omgwenxx commented Jan 14, 2025

I made it work by adapting line 183 in TextualDataset.py

from output_file_name = [f + '.npy' for f in filenames]

to

output_file_name = [str(f.item()) + '.npy' for f in filenames]

because the output is a tensor (as mentioned in the error message).

Now the code runs for a few batches then returns the next error
TypeError: Caught TypeError in DataLoader worker process 4.

BR,
Gwen

@omgwenxx
Copy link
Author

This error occurred due to an item not having a description in my dataset. Now I was able to create all text embeddings with the Ducho framework.

@danielemalitesta
Copy link
Member

Hi Gwen,

Sorry for the delay in the answer, and thank you for the provided feedback.

Yes, we are aware of this issue, due to the possible presence of missing multimodal information in the item set.

While a simple solution might be to drop those items before running Ducho (as you probably did), we are working towards integrating imputation methods into the framework to make it work even when having items with missing modalities.

Compare this recent paper of ours: https://dl.acm.org/doi/10.1145/3627673.3679898.

Stay tuned, and thanks again for the feedback!

Best,
Daniele

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants