Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestsetGenerator.from_langchain Generating failed, randomly stuck at 0% to 80%. #1003

Open
pp6699 opened this issue Jun 2, 2024 · 4 comments
Labels
bug Something isn't working module-testsetgen Module testset generation question Further information is requested

Comments

@pp6699
Copy link

pp6699 commented Jun 2, 2024

[ ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question
TestsetGenerator.from_langchain Generating failed, randomly stuck at 0% to 80% And I've been consuming tokens from the OpenAI API.

By the way, my embedded documents are in Chinese. Would this potentially affect this case?

I am a student who has just started programming, and my understanding of related knowledge is limited. I would be very grateful if there were experts who could understand my incomplete questions and help me.

Filename and doc_id are the same for all nodes.
Generating:  70%|██████████████████████████████████████████████████████▌                       | 7/10 [01:29<00:43, 14.33s/it]

Code Examples

from langchain_text_splitters import RecursiveCharacterTextSplitter
import os
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings


os.environ["OPENAI_API_KEY"] = "sk-xxx"

with open("RAGAS\output.md", encoding='utf-8') as f:
    state_of_the_union = f.read()

text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=512,
    chunk_overlap=128,
    length_function=len,
    is_separator_regex=False,
    separators=[
    "###"
    ]
)

documents = text_splitter.create_documents([state_of_the_union])
print(documents[0])

generator_llm = ChatOpenAI(model="gpt-3.5-turbo")
critic_llm = ChatOpenAI(model="gpt-3.5-turbo")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)
from ragas.testset.evolutions import simple, reasoning, multi_context,conditional

generator.adapt(language="chinese",evolutions=[simple, multi_context, conditional, reasoning])
generator.save(evolutions=[simple, reasoning, multi_context,conditional])

testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25},  is_async=False)

testset.to_pandas()

testset.to_pandas().to_csv("RAGAS\output.csv", index=False)

Additional context
Anything else you want to share with us?

@pp6699 pp6699 added the question Further information is requested label Jun 2, 2024
@dosubot dosubot bot added bug Something isn't working module-testsetgen Module testset generation labels Jun 2, 2024
@huangxuyh
Copy link

卡住了,估计就是json的输出问题了

@hzishan
Copy link

hzishan commented Sep 9, 2024

我看好像是編碼不支持 "~\miniconda3\envs\my_env\Lib\site-packages\ragas\llms\prompt.py" line 286, encoding="utf-8"

@SugarMoonn
Copy link

I also encountered the same problem, has your problem been solved?

@jjmachan
Copy link
Member

hey @SugarMoonn sorry to hear that but we'll get this working for you

  • but is caching working at your end?
  • which version of ragas are you using?
  • how many documents do you have?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module-testsetgen Module testset generation question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants