TestsetGenerator.from_langchain Generating failed, randomly stuck at 0% to 80%. #1003

pp6699 · 2024-06-02T21:40:38Z

[ ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question
TestsetGenerator.from_langchain Generating failed, randomly stuck at 0% to 80% And I've been consuming tokens from the OpenAI API.

By the way, my embedded documents are in Chinese. Would this potentially affect this case?

I am a student who has just started programming, and my understanding of related knowledge is limited. I would be very grateful if there were experts who could understand my incomplete questions and help me.

Filename and doc_id are the same for all nodes.
Generating:  70%|██████████████████████████████████████████████████████▌                       | 7/10 [01:29<00:43, 14.33s/it]

Code Examples

from langchain_text_splitters import RecursiveCharacterTextSplitter
import os
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings


os.environ["OPENAI_API_KEY"] = "sk-xxx"

with open("RAGAS\output.md", encoding='utf-8') as f:
    state_of_the_union = f.read()

text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=512,
    chunk_overlap=128,
    length_function=len,
    is_separator_regex=False,
    separators=[
    "###"
    ]
)

documents = text_splitter.create_documents([state_of_the_union])
print(documents[0])

generator_llm = ChatOpenAI(model="gpt-3.5-turbo")
critic_llm = ChatOpenAI(model="gpt-3.5-turbo")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)
from ragas.testset.evolutions import simple, reasoning, multi_context,conditional

generator.adapt(language="chinese",evolutions=[simple, multi_context, conditional, reasoning])
generator.save(evolutions=[simple, reasoning, multi_context,conditional])

testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25},  is_async=False)

testset.to_pandas()

testset.to_pandas().to_csv("RAGAS\output.csv", index=False)

Additional context
Anything else you want to share with us?

The text was updated successfully, but these errors were encountered:

huangxuyh · 2024-08-10T09:46:49Z

卡住了，估计就是json的输出问题了

hzishan · 2024-09-09T18:51:03Z

我看好像是編碼不支持 "~\miniconda3\envs\my_env\Lib\site-packages\ragas\llms\prompt.py" line 286, encoding="utf-8"

SugarMoonn · 2025-01-22T03:14:43Z

I also encountered the same problem, has your problem been solved?

jjmachan · 2025-01-22T04:48:09Z

hey @SugarMoonn sorry to hear that but we'll get this working for you

but is caching working at your end?
which version of ragas are you using?
how many documents do you have?

pp6699 added the question Further information is requested label Jun 2, 2024

dosubot bot added bug Something isn't working module-testsetgen Module testset generation labels Jun 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TestsetGenerator.from_langchain Generating failed, randomly stuck at 0% to 80%. #1003

TestsetGenerator.from_langchain Generating failed, randomly stuck at 0% to 80%. #1003

pp6699 commented Jun 2, 2024 •

edited

Loading

huangxuyh commented Aug 10, 2024

hzishan commented Sep 9, 2024

SugarMoonn commented Jan 22, 2025

jjmachan commented Jan 22, 2025

TestsetGenerator.from_langchain Generating failed, randomly stuck at 0% to 80%. #1003

TestsetGenerator.from_langchain Generating failed, randomly stuck at 0% to 80%. #1003

Comments

pp6699 commented Jun 2, 2024 • edited Loading

huangxuyh commented Aug 10, 2024

hzishan commented Sep 9, 2024

SugarMoonn commented Jan 22, 2025

jjmachan commented Jan 22, 2025

pp6699 commented Jun 2, 2024 •

edited

Loading