Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#AttributeError: 'pyarrow.lib.ListType' object has no attribute 'list_size' #13

Open
aditya6396 opened this issue Jan 8, 2025 · 1 comment

Comments

@aditya6396
Copy link

thank you for replay but arrange my data according to you done it last replay but in the time of the nomalizing the datasets but its giving me this error mentation it below
i m using this command
uv run python scripts/fit_embedding_normalizer.py --ds pretraining_data:1 --save_path "my local path" --max_nb_samples 1000000

#my dataymal file
name: "pretraining_data"
parquet_path:
s3: "wiki_data"
source_column: "text_sentences_sonar_emb"
source_text_column: "text_sentences"
partition columns:
###this coming due to the python version i using the python 3.10 should i used the python 3.11##

Traceback (most recent call last):
File "/home/cpatwadityasharma/large_concept_model/scripts/fit_embedding_normalizer.py", line 101, in
main(args.ds, args.save_path, args.max_nb_samples)
File "/home/cpatwadityasharma/large_concept_model/scripts/fit_embedding_normalizer.py", line 79, in main
embs = sample_sentences_from_mixed_sources(
File "/home/cpatwadityasharma/large_concept_model/scripts/fit_embedding_normalizer.py", line 52, in sample_sentences_from_mixed_sources
vecs = pyarrow_fixed_size_array_to_numpy(pc.list_flatten(batch[column]))[
File "/home/cpatwadityasharma/large_concept_model/.venv/lib/python3.10/site-packages/stopes/utils/arrow_utils.py", line 152, in pyarrow_fixed_size_array_to_numpy
assert cc.type.list_size is not None
AttributeError: 'pyarrow.lib.ListType' object has no attribute 'list_size'
0it [00:01, ?it/s]

@artemru
Copy link
Contributor

artemru commented Jan 9, 2025

probably similar issue with missing FixedSizeListArray typing #9 (comment)
(see also #12)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants