Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with using secret/credentials for S3 endpoint which appear to not come into play when reading from a (remote S3) parquet file #21

Open
mskyttner opened this issue Nov 21, 2024 · 2 comments

Comments

@mskyttner
Copy link

The following statement works in the CLI (duckdb v 1.1.3):

create secret (
	type S3,
	endpoint 'my.objectstorage.minio.org',
	use_ssl 'true',
	url_style 'path',
	key_id 'some_s3_user',
	secret 'some_s3_secret'
);

from read_parquet('s3://projects/project_details.parquet') 

When used in sql-workbench, it seems the endpoint/url-style settings are not being taken into account, which I assume since the JS console has this message:

Access to XMLHttpRequest at 'https://projects.s3.amazonaws.com/project_details.parquet' from origin 'https://sql-workbench.com' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.

Instead it looks like some defaults seem to be active, which assume the Amazon S3 service is being used (rather than the S3 compatible minio server I was hoping to read from). Notice that it seems that the settings for endpoint and url_style in the secret both seem to have been ignored. As if the secret that was created is not used at all.

Would CORS come into play as a blocker here as well?

Tested in recent Firefox and Chromium browsers with the same outcome. But as mentioned, it works in the duckdb CLI.

@tobilg
Copy link
Owner

tobilg commented Nov 21, 2024

DuckDB WASM currently doesn‘t support SECRETs Integration towards S3 compatible APIs as far as I know.

You can theoretically use the s3_* settings, e.g.

SET s3_access_key_id = '...';
SET s3_secret_access_key = '...';
SET s3_session_token = '...';
SET s3_region = '...';

Additionally, you need to take care of CORS headers for S3 as well, which is not trivial tbh. Even worse with R2.

I'm currently facing similar problems with https://skyfirehose.com/launch

Regarding the CLI, it's expected that it works because the secrets are wired, and it's not using CORS.

@mskyttner
Copy link
Author

Oh, I see, thanks for the info.

I tried working around the CORS and remote S3 connectivity for now by starting a proxy using the duckdb cli with the httpserver extension which I start with a command like DUCKDB_HTTPSERVER_FOREGROUND=1 duckdb -init serve.sql, setting up the secret/credentials through a serve.sql file like this one:

-- provide credentials
.mode trash
load httpserver;

create secret (
	type S3,
	endpoint 'my.minio.org',
	use_ssl 'true',
	url_style 'path',
	key_id 'some_user_key',
	secret 'some_passphrase'
);

attach database 's3://mybucket/myduck.db' as remotedb;
create view latest as from (select * from remotedb.ids order by datestamp desc limit 20);

select httpserve_start('0.0.0.0', 9999, '');

In sql-workbench I can then read the latest data using:

from read_json_auto('http://localhost:9999/?q=from latest')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants