You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thank you for replay but arrange my data according to you done it last replay but in the time of the nomalizing the datasets but its giving me this error mentation it below
i m using this command
uv run python scripts/fit_embedding_normalizer.py --ds pretraining_data:1 --save_path "my local path" --max_nb_samples 1000000
#my dataymal file
name: "pretraining_data"
parquet_path:
s3: "wiki_data"
source_column: "text_sentences_sonar_emb"
source_text_column: "text_sentences"
partition columns:
###this coming due to the python version i using the python 3.10 should i used the python 3.11##
Traceback (most recent call last):
File "/home/cpatwadityasharma/large_concept_model/scripts/fit_embedding_normalizer.py", line 101, in
main(args.ds, args.save_path, args.max_nb_samples)
File "/home/cpatwadityasharma/large_concept_model/scripts/fit_embedding_normalizer.py", line 79, in main
embs = sample_sentences_from_mixed_sources(
File "/home/cpatwadityasharma/large_concept_model/scripts/fit_embedding_normalizer.py", line 52, in sample_sentences_from_mixed_sources
vecs = pyarrow_fixed_size_array_to_numpy(pc.list_flatten(batch[column]))[
File "/home/cpatwadityasharma/large_concept_model/.venv/lib/python3.10/site-packages/stopes/utils/arrow_utils.py", line 152, in pyarrow_fixed_size_array_to_numpy
assert cc.type.list_size is not None
AttributeError: 'pyarrow.lib.ListType' object has no attribute 'list_size'
0it [00:01, ?it/s]
The text was updated successfully, but these errors were encountered:
thank you for replay but arrange my data according to you done it last replay but in the time of the nomalizing the datasets but its giving me this error mentation it below
i m using this command
uv run python scripts/fit_embedding_normalizer.py --ds pretraining_data:1 --save_path "my local path" --max_nb_samples 1000000
#my dataymal file
name: "pretraining_data"
parquet_path:
s3: "wiki_data"
source_column: "text_sentences_sonar_emb"
source_text_column: "text_sentences"
partition columns:
###this coming due to the python version i using the python 3.10 should i used the python 3.11##
Traceback (most recent call last):
File "/home/cpatwadityasharma/large_concept_model/scripts/fit_embedding_normalizer.py", line 101, in
main(args.ds, args.save_path, args.max_nb_samples)
File "/home/cpatwadityasharma/large_concept_model/scripts/fit_embedding_normalizer.py", line 79, in main
embs = sample_sentences_from_mixed_sources(
File "/home/cpatwadityasharma/large_concept_model/scripts/fit_embedding_normalizer.py", line 52, in sample_sentences_from_mixed_sources
vecs = pyarrow_fixed_size_array_to_numpy(pc.list_flatten(batch[column]))[
File "/home/cpatwadityasharma/large_concept_model/.venv/lib/python3.10/site-packages/stopes/utils/arrow_utils.py", line 152, in pyarrow_fixed_size_array_to_numpy
assert cc.type.list_size is not None
AttributeError: 'pyarrow.lib.ListType' object has no attribute 'list_size'
0it [00:01, ?it/s]
The text was updated successfully, but these errors were encountered: