Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GIP] Integrate Apache Superset (dashboarding software) into geOrchestra #10

Open
jeanpommier opened this issue Dec 16, 2024 · 28 comments
Labels
Adopted This proposale is adopted GIP Work in progress The work impulsed by this GIP is in progress

Comments

@jeanpommier
Copy link
Member

jeanpommier commented Dec 16, 2024

Who ?

pi-Geosolutions ([email protected]) with funding from geo2france

Target Module

This will be an additional, optional module

What ?

Apache Superset is a tool for building all kinds of dashboards. What we call a "Business Intelligence" software. It is a Python Flask application.

image

This proposal is about integrating Superset in geOrchestra:

  • handle HTTP remote user authentication, like other geOrchestra apps
  • manage roles in the console

Superset uses a concept of roles, to determine what a registered user can access. This will connect nicely with our roles system. Superset-specific roles will still be managed inside Superset (to determine, for a given roles, what it will give access to), but mapping which roles are assigned to a given user will be managed in the Console, like the other apps. It will behave mostly like what we already have in GeoServer for instance.

Why ?

We don't really have a BI tool right now. Possibly by making dashboards in MapStore, but this is a bit of a strech if working on data that has no geospatial aspect. And I believe Superset is much reacher in what it can do.

Adding an optional BI tool seems like a good asset for geOrchestra. Some platforms are already using it (geo2france, but also geobretagne making experiments in integrating superset and mviewer).

Also, Superset is seriously considered for the visualization part of the new Analytics module, to come soon.

How ?

Superset, for user management, relies on Flask_appbuilder, which supports HTTP remote user mode. It is possible to add some custom logic, which should allow us to also retrieve the roles from the HTTP headers and update accordingly (live) the user's profile.

It should (should) be a fairly simple integration. It should not impact the rest of the platform in any way (except, of course, adding a route in the gateway). Even on Superset's side, ithe implementation will most likely not touch the core app, but live in a few "config" files that can be shipped alongside the core app, ensuring simple maintainance.

Any potential pitfalls and ways to circumvent them ?

Apart from adding more complexity in the whole geOrchestra architecture, I don't see any.

Oh, yes, one: Superset's doc now mostly deals with a dockerized deployment. There is very little doc on how to deploy it without docker. But I got it working, we should be able to provide an Ansible deployment without too much hassle (I might need a bit of help though, since I don't know much about ansible)

When ?

I plan to have it operational by January 2025. It should be possible to beta-test it around Christmas.

State of the vote:

PSC members vote
Fabrice Phung
François Van Der Biest +1
Pierre Mauduit +1
Landry Breuil +1
Stéphane Mével-Viannay +1
Maël Reboux +1
Pierre Jégo 0
Jean Pommier +1
Catherine Piton-Morales +1
@jeanpommier jeanpommier added GIP 1 - Pending The author is working on the GIP proposal 2 - In review This proposal is currently reviewed / discussed by the community and removed 1 - Pending The author is working on the GIP proposal labels Dec 16, 2024
@fvanderbiest
Copy link
Member

Looks good to me !

@MaelREBOUX
Copy link
Member

For illustration, superset could come with out of the box statistics like these below (liked to https://github.com/georchestra/analytics/).

Number of unique users, per week

image

users by years

image

type of OGC services comsuption per day

image

@landryb
Copy link
Member

landryb commented Dec 17, 2024

But I got it working, we should be able to provide an Ansible deployment without too much hassle (I might need a bit of help though, since I don't know much about ansible)

i had a first look last week, and it shouldnt be that hard once (sorry for being technical in a GIP, but that's how i work..) :

  • we know how/where to integrate it (/superset/ ? subdomain ?)
  • we have the bits to do the http header auth/roles auth behind sp/gw (im not ditching the sp)
  • we know the config bits about
  • we know how we want to bootstrap it (create user/default roles, populate db schema)

it can be installed from pypi and this method is supported upstream

some bits could be taken from https://github.com/onaio/ansible-superset/ but thats unmaintained and wayyys too much configurable, we can have something much simpler

all that to say, +1 for me. The real value is not in 'having the tool' but 'providing sample dashboards that replace analytics' like the ones @MaelREBOUX showed....

@jeanpommier
Copy link
Member Author

jeanpommier commented Dec 17, 2024

Right now,it's reading the roles from the Console, filters to keep only the ROLE_SUPERSET_SOMETHING roles and checks for a match in Superset's roles to see if there is a Something role (case-agnostic). Seems to work well.

Question: I'm considering also adding a support based on the user's organization. Like for instance, user A belongs to org Geo2france, then it would also try to match with a possible O_Geo2france role in superset. Is it relevant ? Interesting ? Overkill ?

The idea is to avoid overloading the console with many roles that can be infered in another way. The drawback is that it creates a mapping that is not obvious on first look.

I can also make this optional.

@fphg
Copy link
Member

fphg commented Dec 22, 2024

Very useful and awaited evolution, thanks

Some considerations :

  • as a parametric display of datas, a graph is very complementary to wms. we shall ensure that the default behavior is readonly for anonymous users, as in geoserver (default behavior for superset is forbidden)
  • the TALISMAN filters won't allow embedding for foreign domains, this can be a pain. can you easily change the domain whitelist ?
  • you suggest auth with headers ; we also could make use of a native openid support. benefits / drawbacks ?

@landryb
Copy link
Member

landryb commented Dec 27, 2024

  • we know how/where to integrate it (/superset/ ? subdomain ?)

coming back to this, i have a start of ansible role to deploy it, and having it behind a subdir looks.. complicated without going through gory hacks, and unsupported upstream. Cf apache/superset#24823, and https://github.com/komoot/superset-reverse-nginx-example among others.

how do you have it running ? superset.georchestra.example.org ?

@jeanpommier
Copy link
Member Author

Yup. That was a blind spot when I estimated the work to do. I considered it such a basic feature that I forgot to check.

There is a PR that has good chances to get into master, has quite some community support and upstream devs seem OK to consider it, but still a bit young to be certain: apache/superset#30134. I'm betting on it and will support it. As a first step (in the next few months), we will probably have to use a fork if we want this feature.

I'm also investigating the code. Flask comes with blueprints module, which is specificly made to support this usecase, but the way Flask appbuilder implements it is a bit weird. And Superset folk, well, I'm not sure they remember they had this. Anyway, the JS frontend does not support it. And most of the backend doesn't seem to either.

The alternative being to use a separate subdomain (I personally don't favour this).

@landryb
Copy link
Member

landryb commented Dec 30, 2024

looking at the PR, i see that changing the base path requires 'rebuilding' the JS frontend.. oh well. modern web dev..

@jeanpommier
Copy link
Member Author

Well, yes, that's one of the things I'd like to act on. I don't see why the frontend couldn't read a config file provided by the backend, instead of rebuilding based on a hard-coded path... But that won't be a short-term feature ;o).

@jeanpommier
Copy link
Member Author

* as a parametric display of datas, a graph is very complementary to wms. we shall ensure that the default behavior is readonly for anonymous users, as in geoserver (default behavior for superset is forbidden)

Noted. Makes sense to me. This is authorization config though (done through roles), so it might vary depending on the platform. Not really an integration topic IMO.
I'll have a look though, and we can at least provide instructions on how to achieve such results (I remember it was hard to do)

* the TALISMAN filters won't allow embedding for foreign domains, this can be a pain. can you easily change the domain whitelist ?

Didn't know about that, but I'd say it is covered in https://superset.apache.org/docs/security/#content-security-policy-csp, isn't it ?

* you suggest auth with headers ; we also could make use of a native openid support. benefits / drawbacks ?

Auth with headers is geOrchestra's basic behaviour. For me, this is implied in "integration in geOrchestra". Using openid would rather be "deploy alonside geOrchestra".

Main benefit I see is to be able to assign roles to users in the geOrchestra console, like for the other integrated apps.

@jeanpommier jeanpommier added 3 - Ready for votes Proposal open for votes to the PSC and removed 2 - In review This proposal is currently reviewed / discussed by the community labels Jan 2, 2025
@MaelREBOUX
Copy link
Member

I voted : +1

@jeanpommier
Copy link
Member Author

+1 ;o)

@fvanderbiest
Copy link
Member

+1 too

@catmorales
Copy link

+1

2 similar comments
@pmauduit
Copy link
Member

pmauduit commented Jan 9, 2025

👍

@fphg
Copy link
Member

fphg commented Jan 9, 2025

+1

@fphg
Copy link
Member

fphg commented Jan 9, 2025

Since this GIP is kind of a big evolution, we shall clarify its compatibility with the manifest

"geOrchestra is a spatial data infrastructure project whose founding characteristics are: free, modular, interoperable. This project is community driven."

  • free : OK since superset is apache 2.0
  • modular : 2 POV.
    ** 1 : the intended feature is optional ; true ?
    ** 2 : the intended feature may be delivered by a different software, if the instance already has one or choose a different one ; true ?
  • interoperable. how can we describe superset interoperability in georchestra ?

Note : analytics (other GIP) and this GIP are connected since analytics will use superset. This is not a problem for modularity. analytics can be superset dependent, letting the admin choose another dataviz system for regular datas.

@fphg fphg closed this as completed Jan 9, 2025
@pierrejego
Copy link
Member

Great GIP,
Dashboard is necessary

I vote +0 just because I never took time to check superset vs other choice, but sure you did :)

@MaelREBOUX MaelREBOUX reopened this Jan 10, 2025
@MaelREBOUX MaelREBOUX added Adopted This proposale is adopted Work in progress The work impulsed by this GIP is in progress and removed 3 - Ready for votes Proposal open for votes to the PSC labels Jan 10, 2025
@jeanpommier
Copy link
Member Author

Thank you all.

I'm going to create a git repo for this feature, that will contain documentation and custom code.

I want to name the repo after the path that we will give to access superset visualizations. This is something that was already a bit discussed with @landryb when he was investigating deployment with ansible. We went for /dashboards. Does it seem right for you ?

Important detail about this: for now, superset does not allow for a dynamic change of this path. changing the path means rebuilding the JS frontend. So it's probably better if we settle on a name/path that is common to all geOrchestra instances.

@fvanderbiest
Copy link
Member

We went for /dashboards. Does it seem right for you ?

LGTM :-)

@landryb
Copy link
Member

landryb commented Jan 14, 2025

i would then expect a repo named dashboards to also contain dashboards demo/samples ready to import :)

@jeanpommier
Copy link
Member Author

i would then expect a repo named dashboards to also contain dashboards demo/samples ready to import :)

That will probably come when I deal with analytics.
For now, It will be hard to provide dashboard examples over datasets that don't exist. But we can probably design some dashboards tapping on remote data.

In my mind, the repo dashboards is mostly meant to contain doc and code to deploy the dashboards functionality (software)

@jeanpommier
Copy link
Member Author

jeanpommier commented Jan 14, 2025

But then, as it is open source projects, you'll of course be welcome to contribute dashboard demo/samples ;o)

@fvanderbiest
Copy link
Member

Proposing /explore for the module path, to prevent duplication of dashboard or superset in the paths.

@jeanpommier
Copy link
Member Author

jeanpommier commented Jan 16, 2025

Also taken ! See https://github.com/apache/superset/blob/master/superset-frontend/src/views/routes.tsx#L205

What about /analyze ?

Or, since this kind of dataviz is also called "Business Intelligence", /bi ?

@fvanderbiest
Copy link
Member

Also taken ! See https://github.com/apache/superset/blob/master/superset-frontend/src/views/routes.tsx#L205

OMG, now I understand why they usually take a full domain.

@fvanderbiest
Copy link
Member

What about /analyze ?

Or /analytics ?
analytics is not only restricted to getting figures on the service usage

@edevosc2c
Copy link
Member

edevosc2c commented Jan 17, 2025

Some naming ideas that I had:

  • laboratory: In superset, you create dashboards to analyze and study data based on various data. In a laboratory, you do the same thing (except that it's usually on a restricted data)
  • spacecontrol: Related to what you see in the movies, there are usually many dashboards everywhere on the screen with many data at the same time. Using superset in front of another person could give the same feeling to the other person. And the space control is the place to monitor the spaceship, the spaceship here is geOrchestra, and you need to control it to go into the right direction through decisions made from visualizing the dashboards.
  • datadig: The action of digging the data to find what you are looking for through various dashboards to visualize from. Somewhat similar to the name Datadog (I hope it's ok legal wise)
  • landscape: Through SuperSet you can see the various landscape of beautiful dashboards created out of the various data available in the lands of Databases.

(Sometimes the name of a project doesn't need to have a direct relation to what people think is it supposed to be, sometimes an easy name and which sounds good will stay better in people's mind than a complicated name that exactly describe the meaning of the project.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Adopted This proposale is adopted GIP Work in progress The work impulsed by this GIP is in progress
Projects
None yet
Development

No branches or pull requests

9 participants