Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: CodeSplitter not working #17567

Open
Bavalpreet opened this issue Jan 21, 2025 · 3 comments
Open

[Bug]: CodeSplitter not working #17567

Bavalpreet opened this issue Jan 21, 2025 · 3 comments
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@Bavalpreet
Copy link

Bug Description

from llama_index.core.node_parser import CodeSplitter

splitter = CodeSplitter(
language="python",
chunk_lines=40, # lines per chunk
chunk_lines_overlap=15, # lines overlap between chunks
max_chars=1500, # max chars per chunk
)
nodes = splitter.get_nodes_from_documents(md_nodes)

Error
Could not get parser for language python. Check https://github.com/grantjenks/py-tree-sitter-languages#license for a list of valid languages.

TypeError Traceback (most recent call last)
in <cell line: 0>()
1 from llama_index.core.node_parser import CodeSplitter
2
----> 3 splitter = CodeSplitter(
4 language="python",
5 chunk_lines=40, # lines per chunk

/usr/local/lib/python3.11/dist-packages/llama_index/core/node_parser/text/code.py in init(self, language, chunk_lines, chunk_lines_overlap, max_chars, parser, callback_manager, include_metadata, include_prev_next_rel, id_func)
75 import tree_sitter_languages # pants: no-infer-dep
76
---> 77 parser = tree_sitter_languages.get_parser(language)
78 except ImportError:
79 raise ImportError(

tree_sitter_languages/core.pyx in tree_sitter_languages.core.get_parser()

tree_sitter_languages/core.pyx in tree_sitter_languages.core.get_language()

TypeError: init() takes exactly 1 argument (2 given)

Version

latest

Steps to Reproduce

from llama_index.core.node_parser import CodeSplitter

splitter = CodeSplitter(
language="python",
chunk_lines=40, # lines per chunk
chunk_lines_overlap=15, # lines overlap between chunks
max_chars=1500, # max chars per chunk
)
nodes = splitter.get_nodes_from_documents(md_nodes)

Error
Could not get parser for language python. Check https://github.com/grantjenks/py-tree-sitter-languages#license for a list of valid languages.

TypeError Traceback (most recent call last)
in <cell line: 0>()
1 from llama_index.core.node_parser import CodeSplitter
2
----> 3 splitter = CodeSplitter(
4 language="python",
5 chunk_lines=40, # lines per chunk

/usr/local/lib/python3.11/dist-packages/llama_index/core/node_parser/text/code.py in init(self, language, chunk_lines, chunk_lines_overlap, max_chars, parser, callback_manager, include_metadata, include_prev_next_rel, id_func)
75 import tree_sitter_languages # pants: no-infer-dep
76
---> 77 parser = tree_sitter_languages.get_parser(language)
78 except ImportError:
79 raise ImportError(

tree_sitter_languages/core.pyx in tree_sitter_languages.core.get_parser()

tree_sitter_languages/core.pyx in tree_sitter_languages.core.get_language()

TypeError: init() takes exactly 1 argument (2 given)

Relevant Logs/Tracbacks

@Bavalpreet Bavalpreet added bug Something isn't working triage Issue needs to be triaged/prioritized labels Jan 21, 2025
Copy link

dosubot bot commented Jan 21, 2025

To resolve the "Could not get parser for language python" error and the TypeError: __init__() takes exactly 1 argument (2 given) when using the CodeSplitter from llama_index.core.node_parser, follow these steps:

  1. Ensure tree_sitter_languages is Installed and Up-to-Date:

    • Make sure the tree_sitter_languages package is installed in your environment. You can install it using pip:
      pip install tree_sitter_languages
    • Verify that the package is up-to-date to ensure compatibility with the latest features and bug fixes.
  2. Check Language Support:

  3. Verify Parser Initialization:

    • The error TypeError: __init__() takes exactly 1 argument (2 given) suggests there might be an issue with how the parser is being initialized. Ensure that the get_parser function is being called correctly and that it returns a valid tree_sitter.Parser object.
  4. Custom Parser Object:

    • If the automatic parser retrieval is not working, you can manually create and pass a parser object to the CodeSplitter:
      from tree_sitter import Parser
      import tree_sitter_languages
      
      parser = tree_sitter_languages.get_parser("python")
      splitter = CodeSplitter(
          language="python",
          chunk_lines=40,
          chunk_lines_overlap=15,
          max_chars=1500,
          parser=parser
      )

By following these steps, you should be able to resolve the errors and use the CodeSplitter with Python successfully [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@logan-markewich
Copy link
Collaborator

@Bavalpreet newer versions of tree sitter had some breaking change. Downgrade your tree sitter package. Seems no one wants to maintain that package

Install something less than 0.22

@logan-markewich
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

No branches or pull requests

2 participants