-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Secondary provider identifer #33
Comments
Here is the Spark perspective, which most likely applies to any other networks performing retrieval testing based on on-chain data.
Currently:
It may be worth exploring what this means for retrieval testing in general.
A bigger question is how much can retrieval checkers trust the MinerID-to-ProviderID mapping.
A potential solution I see - let's discuss if it's viable?
I don't know if this makes the implementation any simpler on the Curio side? Also, this is viable only if Curio provides the same retrievability for all deals, irrespective of which miner ID they are linked to. |
@LexLuthr and @masih tagged me on this issue, likely related to my interest in #20 and that IMO the binding of provider IDs to peerIDs that's currently in place is not a good idea. I'll try and give my understanding of both the current situation for IPNI and IPFS as well as what's needed with Curio + Spark which isn't necessarily the same. TLDR on recommendations:
My current understanding (but someone like @masih should double check me) is that there are two reasons for the current structure of having a provider PeerID + a set of (multi)addresses.
These are both fairly unimportant reasons though because:
The downsides of operating this way are mostly that we add the inconvenience and confusion of bogus libp2p peerIDs being sometimes added to multiaddrs in ways that are confusing. For example, when encountering /dns/foo.tld/tcp/443/https/p2p/12D3Foobar are you supposed to drop the /p2p/... component because you know it was added as a hack, or does this indicate that you're trying to do libp2p over HTTP with PeerID auth? Overall this makes using IPNI for content routing with systems that are not using libp2p peerIDs (e.g. CA authenticated HTTPS addresses, HTTP to Tor hidden services, BitTorrent ....) more painful and hacky for no real benefit.
I'd wonder what @masih and @willscott think, but as I understand it:
@LexLuthr Having more information about how you need this mapping to show up / be used would likely make discussion easier.
@bajtos let's back up a bit to consider what the attack model is, before figuring out the solution. Some examples:
This seems resolvable by keeping peerIDs / public keys as the identifiers for the advertisement chain itself, but still figuring out a way to associate arbitrary data with a provider (e.g. the metadata field or something else). Using something other than a cryptographic key here to identify the mutable data that is the advertisement chain seems like a bigger ordeal (e.g. it looks a lot like the entire DID space). |
I will try to answer all the question directed at me as best as I can. In case I have missed something, please feel free to tag me in.
|
Note: HTTP over libp2p != libp2p over HTTP
|
Perhaps my own cluelessness about Spark, but how is this not abusable? If Spark wants to prove "minerX has advertised CID Y to IPNI and it's downloadable from an endpoint controlled by minerX" then there needs to be some proof binding minerX to the peerID (i.e. not just some text mapping) and some proof binding the HTTP endpoint to either the peerID or the minerID. It sounds like both are missing. You could relax the condition and say minerX doesn't have to advertise their CIDs as long as somebody out there advertises that minerX has CID Y at an endpoint minerX controls. Doing this would mean an "advisory" mapping of peerID -> minerX in IPNI could be ok, but it comes with the potential of added work / attack surface for Spark since what if someone who isn't minerX also publishes an advisory mapping but to an endpoint that doesn't resolve properly? |
Thanks for clarifying this.
Maybe something like below. It would allow more flexibility around what this extra binding info can be. type ExtraMetadataType string
const (
Filecoin-SP ExtraMetadataType "miner"
IPFS ExtraMetadataType "IPFS"
... Extend as required
)
type ExtraMetadata struct {
Type extraMetadataType
Data []byte
Sig crypto.Signature
}
func (e ExtraMetadata) GetType() ExtraMetadataType {
return e.Type
} |
Not an IPNI maintainer or anything, but IMO requiring IPNI nodes to understand Filecoin seems like a bad idea / mismatch of concerns. IPNI does not care at all about the minerID, Spark cares about the minerID and so it seems like those systems should negotiate the relationship. This is a way bigger ask then an extra metadata field, it's asking IPNI nodes to understand Filecoin and run their own nodes and/or outsource to some trusted RPC provider.
My suggestion wasn't to use it for Curio<>IPNI, but to use it between Curio<>Spark because my understanding is that Spark needs some way to know that the HTTP endpoint Alice advertises belongs to her and she's not just pointing you at Bob's endpoint. There is a flaw here in that I assumed that there was a mapping of minerID -> peerID somewhere that Spark could trust. This seems to indicate that you can either:
|
I agree that IPNI protocol should not get Filecoin specific. But it is libp2p specific right now. This is a problem for anyone trying to get away from libp2p. The next iteration of deal protocol will be pure HTTP. The on chain libp2p peerID won't matter after that.
Spark is another retrieval client for Curio. Curio does not distinguish between who requested what from which minerID. You request some data and if Curio has it then it will respond. Charlie can retrieve deal made with Alice and Bob both using the same endpoint. This is by design. This is HTTP retrieval for full piece and an IPFS gateway.
MinerID to IPNI Provider PeerID mapping is Curio internal at the moment. There is no existing format on chain that we can use to update it.
HTTP Address are per cluster and not per minerID. So on chain address of multiple minerIDs can be same.
I am not sure why this particular attack vector is important or maybe I am misunderstanding it. All Spark does is verify retrievability unless I am wrong here. It should not matter how backend serves the data or from which source. Only thing we should care about is that I looked up a piece which was sealed with Alice on IPNI. I got the address to retrieve the said piece (or part of a piece) and I was able to retrieve it. It doesn't matter if it was served from some sector Bob might be holding. |
Maybe I'm not the one understanding Spark's purpose and so someone will correct me, but IIUC it matters to Spark who is serving the data. Aside from one Curio instance being able to back many miners with different IDs, if Spark's goal is to figure out which SPs are serving data well I can do the following:
Can you walk me through the libp2p-specific parts? Below I've tried to list every place I can recall libp2p being used within IPNI and almost everything seems optional, and certainly anything at the transport layer looks optional.
So what's the libp2p-specific thing you're concerned with? The only thing you're actually stuck with is using the libp2p peerID format for encoding public keys instead of a different public key encoding format or pushing for to be used instead. Is it just the inelegance of the libp2p peerID format? |
Spark doesn't announce which data it will look up and when. So, how can an SP which is serving the retrievals, be a bad SP? As an SP, I should have every right to save space and b/w as long as I don't compromise on the provided service quality. Another things, all minerIDs served by Curio have same peerID for making deals i.e. on chain peerID. So, again there is no way to know who signed the data. Forcing SPs to have separate keys just to sign retrievals seems too much.
I would love it if this identifier can be arbitrary public key. But if I look at the code, it is not. That is what makes this libp2p specific. If we can make this use bls3 or other public keys along with peerID then Curio or other providers can simply sign with worker wallets(or other relevant keys). This would make whole thing cleaner and easier to look things up on chain.
GossipSub is deprecated AFAIK. New versions all use http-libp2p or HTTP. Curio is a HTTP only provider.
This works in Boost right now without any TLS or domain name. All data is still signed by libp2p key for data auth in indexer side. In fact, Gossipsub perf was really bad with Boost. All of our users using libp2p only announcements had sync issues. Some of them were not even found by indexer.
We mostly agree on how things should work. Maybe just supporting more formats is the solution. |
Hey, great discussion! Regarding the Spark attack vector, where SPs delegate serving retrievals to other SPs. It is a valid attack vector, but the impact is very low right now - most SPs don't serve retrievals at all (less than 15% of retrieval checks succeed), and from what we have seen, people operating SPs are not sophisticated enough to deploy such a solution. They are struggling to even properly configure Boost + booster-http + IPNI integration. From our perspective, we need to have an idea of how to mitigate this attack vector in the future (6+ months), but we don't need the solution to be designed & implemented right now. Potential options I see:
I have slightly different view.
Checking whether "minerX has advertised CID Y to IPNI" is good enough for now, as far as we are concerned. Checking whether "it's downloadable from an endpoint controlled by minerX" is one of many improvements we will eventually need to implement, and we need to prioritise it relative to other improvements needed. Linking MinerId to IndexProvider PeerID In Spark, we have only two requirements:
We are open-minded about which solution to use. Spark can support multiple ways of linking miners to index/retrieval providers, if necessary.
I don't see why it seems too much to ask Curio to have a unique IPNI provider ID for each miner it serves. Having said that, I don't have a strong opinion. If we can find a solution that works within your constraints, then we can adopt it. Cross-posting from a Slack discussion thread in the
Based on all that has been written so far, I'd like to propose the following solution. (1) (2) The signature must be over a data structure that includes Spark can obtain the owner/worker/control wallet address using the RPC API method Filecoin.StateMinerInfo. Curio can produce the signature using the existing infrastructure for signing messages with the owner/worker/control wallet. (Is this feasible & reasonably easy to implement?) The obvious downside is metadata size - for each miner listed in the index provider metadata, we need to include the signature (64 bytes when using ECDSA+ secp256k1). @masih what is the (practical) limit on how many bytes index providers can put into boost extended providers or metadata at root ad? |
Depending on how the timing and the outcome of this discussion, we may want to include the results in my FRC documenting Retrieval Checking Requirements - see filecoin-project/FIPs#1089 |
Taking a step back: the simple requirement here is to map Peer ID to miner ID. The simple solution to this is to publish the miner ID as a metadata value of the top level ad. What have I missed? |
That works for us (retrieval checking & Spark). However, we need measures to prevent an adversary index provider from claiming a miner ID they don't control. See the second half of my comment #33 (comment) for more details and a possible solution. I guess we need to hear from @LexLuthr @steven004 whether such a solution is feasible for Curio and Venus Droplet. |
to make sure I understand what's happening here:
It sounds like this should work from everyone's perspective |
@bajtos and Curio team has agreed on a solution proposed in https://docs.google.com/document/d/1LeZ3miNRNH0Mhxl_G2LjRk29vni5l83KhPkjDa552J8/edit?usp=sharing. We will be working on the smart contract and other changes. This should remove any requirement on IPNI side. |
Currently, all providers are identified using the libp2p peer ID. This creates a problem for providers which do not have a libp2p subsystem (ex: a HTTP provider).
My requirement:
Curio does not use the same libp2p peerID for IPNI provider and libp2p. The libp2p ID is shared between multiple minerIDs. This make using it impossible for IPNI. I need a reliable way to establish a relation between peerID and a miner ID within the indexer i.e. no external look up.
Possible solutions:
The text was updated successfully, but these errors were encountered: