Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(idp-extraction-connector): remove double download in idp connector #3869

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

reiballa
Copy link
Contributor

@reiballa reiballa commented Jan 21, 2025

Description

I removed the double document download that happened because i used the asByteArray method. I reworked my solution to just use the input stream.

Related issues

closes #3868

Checklist

  • PR has a milestone or the no milestone label.

@reiballa reiballa self-assigned this Jan 21, 2025
@reiballa reiballa marked this pull request as ready for review January 22, 2025 00:54
@reiballa reiballa requested a review from a team as a code owner January 22, 2025 00:54
@sbuettner
Copy link
Contributor

Hey @reiballa, I took a quick look at it seems like we are reading the full stream to get the content length and then upload it which doesnt use a streaming approach as everything will be stored in-memory.

I did a quick research and the transfer manager which should support streaming uploads seems to be available now as of: aws/aws-sdk-java#474 (comment)

Developer docs: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/transfer-manager.html

@reiballa
Copy link
Contributor Author

reiballa commented Jan 23, 2025

@sbuettner Are you suggesting we look into updating the library? Because i checked what we had now, and ommiting the contentLength wasnt possible:
This is from AsyncRequestBodyFromInputStreamConfiguration

private AsyncRequestBodyFromInputStreamConfiguration(DefaultBuilder builder) {
        this.inputStream = (InputStream)Validate.paramNotNull(builder.inputStream, "inputStream");
        this.contentLength = Validate.isNotNegativeOrNull(builder.contentLength, "contentLength");
        this.maxReadLimit = Validate.isPositiveOrNull(builder.maxReadLimit, "maxReadLimit");
        this.executor = (ExecutorService)Validate.paramNotNull(builder.executor, "executor");
}

@sbuettner
Copy link
Contributor

sbuettner commented Jan 23, 2025

@reiballa

Are you suggesting we look into updating the library?

Yeah, exactly. I think it provides what you are looking for to stream without needing to have the content length beforehand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug downloading documents twice in IdpConnector
2 participants