-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative mesh routing protocol options #57
Comments
Would it be reasonable to port and or implementation the batman-adv to the
ESP32 and leverage both or either of the wifi or Lora radios?
https://openwrt.org/docs/guide-user/network/wifi/mesh/batman
…On Thu, Feb 27, 2020, 10:07 AM X-Ryl669 ***@***.***> wrote:
If I understand your wiki completely, you're only dealing with 2 level of
abstraction for the routing table.
As I understand it, it almost impossible to reach a node in a large
network (let's say on the 120-th hop), because:
1. The origin node will never be aware of it
2. The limited routing table size can not grow exponentially.
3. The metric are not static, it can fluctuate depending on the moon
or the position of the node or a car passing by and so on. Thus, sometimes
packet will flow into the network correctly, sometimes they'll just vanish.
Typically, let's say you have this topology:
[A] => [B] => [C] => [D] => [E] => [F]
\=> [E]=> [F]
If node A wants to talk to node F, then it must know about it (this is
impossible to solve by storing all possible nodes's MAC, BTW).
Let's say that it has learnt about it previously, and now wants to talk to
it.
It sends a packet with DEST=F and in its routing table, it needs to go via
NEXTHOP = B.
Similarly, B needs to have it in its routing table also (how could that
possible scale if the number of node is very high ?).
Let's say B is Rainman and remembers about all possible nodes.
When B receives it, it'll have to make a choice between C and E for its
nexthop.
The metric for C might be better because C is closer to B, even if there
is an additional node D in the chain. So it chooses C. The conditions are
bad, and the packet struggles to follow the chain that goes via D to reach
E (or F) immediately.
After some time, A would resend the packet to F, and by the time, the
metric in for F would increase so that B select E.
D finally wakes up and send the packet to E and finally F.
Then F can/will receive 2 packets (one coming from D=>E, another from E).
How can it replies ?
The issue described above is not a theory, it's the main routing issue we
observe on internet. I'm not even speaking of bad route or packet loss here.
As I understand it, any approach based on storing a dictionary with all
the possible participant will fail since it grows exponentially in memory
and resources consumption (image the above example if all nodes could see F
except A).
1. We need some kind of path detection algorithm to figure out the
best route to a destination.
2. We need some kind of logarithm abstraction somewhere to break the
exponential grow (something like a tree or a forest algorithm) required by
the current algorithm.
3. We need a way to avoid dual path sending (because it'll be almost
impossible to keep the duty cycle to 1% if all paths starts sending the
same packets twice)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#57?email_source=notifications&email_token=AHM3AJD4KTX7U4BI26TOPALRE76M7A5CNFSM4K5AILM2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IQ37XUA>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHM3AJE4D4VJUYKPFPGJDK3RE76M7ANCNFSM4K5AILMQ>
.
|
Sorry I don't have much experience in this regard, but what about IPv6 on this? Maybe Yggdrasil (like cjdns). This maybe totally unrelated and not even possible, just throwing ideas around. |
@Head2Wind @randomodbuild The maximum LoRa packet size is not large enough for even the smallest IPv6 packet. Currently babel uses IPv6. I'm not convinced that babel is appropriate for disaster.radio but it might be an interesting experiment if someone modified it to work with the smaller packet size.
Do you have an example of a mesh protocol that does not store an entry for each participant? I'm certain the current protocol can be greatly improved and I'm interested in any previous work in this area. My only experience is with WiFi mesh protocols such as babel, batman-adv and OLSR and they all store entries for every node in the mesh. |
Internet ? (with all its protocols, ICMP / Spanning Tree Protocol / BGP). |
@X-Ryl669 the limitations you describe is the same basic limitation that babel, batman-adv, olsr and other mobile ad-hoc mesh routing protocols have. But it is much more severe as LoRa allows only very small amounts of data to be transmitted. DNA seems to transmit complete routing tables (up to 30 routing entries in one transmission) and data packets also serve a hello packets. There are approaches that do not need to transfer all MAC addresses, but they have other drawbacks or have not left research (yet). Yggdrasil uses an approach using a spanning tree (a name-dependent routing layer) and a DHT on top to map MAC addresses to locations on the spanning tree (aka name-independent routing). There are other approaches that have not been tested in the real world... |
If we can assume that something like 50% of nodes has access to GPS, either on-board or via the Bluetooth app, could we do something with that? The nodes just worry about accurate routing within +/- 0.5 degrees lat/long of their location, and aside from that they just send the packet closer to its grid location, towards nodes that will have a more accurate idea of where it is. |
@samuk what you propose is geographic routing. The general problem with this is that you will have local minima where packets get stuck (like in a one way street). The usual approach to "fix" this problem is to use a fallback-algorithm to get unstuck (e.g. face routing). I think this is not reliable and does not always avoid loops. Anyway, you do not need GPS. Some papers suggest randomly selected "light tower" nodes to form a virtual coordinate system. But imho, you can create virtual coordinates even without (e.g. based on Vivaldi Network Coordinates). |
Thanks, I'd probably heard the idea somewhere but didn't realise geographic routing was a well-established concept. I also spotted this 2015 paper which suggests Cost to the Progress Ratio Routing might be worth a look. @X-Ryl669 Our current understanding is that there is a 10% duty cycle in the EU, is this not correct? |
Hi, What about using 6LoWPAN Header Compression to reduce the packet size? |
I wonder if perhaps we are trying to resolve a problem that is an outlying
possibility. IMO IPv6 is overkill for what the primary use case is. IPv4
should be sufficient, unless I am missing something. Also, based upon
this, the number of devices that need to be part of the Layer 2 mesh could
be substantially reduced. I don't know the LoRa com protocol very well at
all so cannot speak to a meaningful answer to my thoughts on the matter.
Personally, I like the possibility to of a long range L2 based 'wWAN' with
LoRa, plus a near range L2 'wLAN' with a designated node for 'wWAN-route'
with the WiFi (2.4ghz), then a BLE serial com link for app interactions.
These are a lot of wants and its likely complex to make it this way.. Just
thinking out loud.
…On Mon, Mar 2, 2020 at 7:30 AM Esteban ***@***.***> wrote:
@Head2Wind <https://github.com/Head2Wind> @randomodbuild
<https://github.com/randomodbuild> The maximum LoRa packet size is not
large enough for even the smallest IPv6 packet. Currently babel uses IPv6.
I'm not convinced that babel is appropriate for disaster.radio but it might
be an interesting experiment if someone modified it to work with the
smaller packet size.
@X-Ryl669 <https://github.com/X-Ryl669>
As I understand it, any approach based on storing a dictionary with all
the possible participant will fail since it grows exponentially in memory
and resources consumption
Do you have an example of a mesh protocol that does not store an entry for
each participant? I'm certain the current protocol can be greatly improved
and I'm interested in any previous work in this area. My only experience is
with WiFi mesh protocols such as babel, batman-adv and OLSR and they all
store entries for every node in the mesh.
Hi, What about using 6LoWPAN Header Compression to reduce the packet size?
And similarly, the RPL protocol for 6LoWPAN networks is virtual coordinate
routing protocol (concretely a gradient-based routing protocol). I think it
could solve the problem of having a next hop for every node in the network
and could scale better.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#57?email_source=notifications&email_token=AHM3AJBDTXH36NK7KKUYTQ3RFPGJNA5CNFSM4K5AILM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENPXVBY#issuecomment-593459847>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHM3AJHNG77UG7Y4VUHWI5DRFPGJNANCNFSM4K5AILMQ>
.
--
~~ Ken Nelson ~~
Voice: 360.389.ADVM (2386)
“The mind is sharper and keener in seclusion and uninterrupted solitude.
--- Be alone, that is the secret of invention; be alone, that is when ideas
are born.” - Nikola Tesla
ADVmachines™ Proven Solutions for your Worldwide Adventures
www.advmachines.com ~ www.motolumen.com ~ www.motosherpa.com ~ www.xfile.ai
<http://www.advmachines.com>
www.advmod.com ~ www.eighteen650.com
|
From what I read about 6LoWPAN, it looks like it does not support meshes with loops. It forms a tree and the root is one edge router/gateway. |
Pros of 6LoWPAN:
Pros of RPL protocol:
Pros of LRP protocol for mesh routing:
Cons of 6LoWPAN:
Cons of RPL protocol:
Cons of LRP protocol:
If you read french, this is a very good document comparing few mesh protocols. For english speakers, see here for presentation of protocol and a reference source code for ContikiOS |
The team has done a nice job building the existing LL2 protocol. What about not sending/shearing routing tables at all, and use packets header information instead? All nodes would be able to update the routing tables by reading the headers of all "heard" packets. This would keep the routing tables updated, even if changes in the mesh were fast (compared to periodic table updates). The header could look like this, name (byte): If used 4 byte names, the header would be in total of 22 bytes, for 3 byte names 18 bytes, and for 2 byte names 12 bytes in total. It is a increase for the header, compared to the current LL2 header, but there would be no need for forwarding costly routing tables. The header is always added, and updated while forwarding, to all packets sent. The nodes would then update their neighbors and routing tables, according to the information they receive from the headers they "hear" (not only from packets they are the recipient or intended forwarder). As a packet is received, the node sends an ACK to the source_node. As the messages and ACK's are passed, the nodes forwarding, or "listenig", can update the total hop counts to other nodes, and thereby count the metrics. Reducing airtime by:
If all control packets are discarded, from the protocol, then how do we begin messaging if no routes exist? I do suggest flooding as a failower method: --> Flood (broadcast) if no routes Small mesh 1-2 hop networks should shift to routing within first packets sent on the network. Larger networks might sift to routing by "zones", as 1-2 hop nodes are quickly learned, and the routes to the nodes on edges might take somewhat longer to learn. Some flood optimizations to consider:
Some routing optimizations to consider:
Any comments? |
Thanks!
I've wanted to roll a portion of routing table maintenance into normal operation and I agree that doing so makes sense.
To make sure I'm understanding this correctly, if you have the simple network, Then a example header for a message sent A to F at the C to D hop would read, ttl, address of A, address of F, address of C, address of B, address of D, seq_number This isn't so different from the existing packet/datagram combined header, it just adds two addresses, the source node and the previous node. However, whose sequence number is in the packet? How does the receiving node calculate a metric for three hops with only one sequence number? Another issue is that, like MAC headers, I would want packet headers to only be used node-to-node. After one hop the packet header is stripped off and rewritten for the next hop. Only the datagram stays the same over the course of the route. So I would rather expand the datagram, like so, ttl (1), sender_node (4), next_node (4), seq_num (1), destination_node (4), source_node (4), source_seq_num (1) Notice that I added the sequence number of the source node. This would allow the receiving node to calculate the metric for the source node as well as the sender node. I'm still not sure this would work unless each source-destination pair has their own sequence number, though this adds another layer of complication where you are actually calculating the metrics of routes instead of destinations. The only way you could reasonably include the prev_node is if you also provided a sequence number from that node. Otherwise you know nothing more than that you have received some packets from that intermediate node. In this design, every node would need to calculate metrics for every route on it's own, rather than nodes sharing metrics with one another. You also don't have a good way of knowing how many hops you are from the source node without adding a hop counter byte to the datagram. My main question is, how does a node make routing decisions? If a node has multiple routes to the same destination, how does the node decide now which route to use? If it's only based on metrics. How do you calculate accurate metrics for all destinations with limited knowledge of the route to get to them?
This gives me the idea that we could provide the ability to choose how large you want your address to be? Might be a cool option. So if you want a small, high-efficiency network, you could choose 2 byte address, but then you would have to manually set node addresses because there are only 254 usable addresses. Or if you are deploying a larger, more dynamic network, you could use 4 byte addresses.
I'm hesitant about adding ACKs because of the limited duty cycle and introducing too much noise to networks.
I think we would still want occasional "i'm alive" messages sent when the network is quiet so that routes don't get stale if a node is not used for a while.
The idea of routing by zones is interesting to me. Could be a good line of thought to investigate further. I have a hard time squaring it with the dynamic, ad-hoc design goal of LL2, but maybe I just need to think about it more deeply. I like some of your ideas. It almost sounds like a source specific protocol. It may be worth simulating this to see how a larger network would deal with this alternative method of routing table management.
See above :) |
@paidforby Interesting notes!
Correct.
My mistake - my seq_number refers to total hops from source_node, to current node. In your example, it would be 2 hops (B and C). Therefore all nodes hearing the packet, can count hops to neighbors, previous sender, and to the source. Whereas you refer, with sequence number, to a increasing packet resend number (identifying packet loss)? We shall change, to avoid further mistakes: seq_num meaning times sent, and hop_count meaning total hop count from source?
The headers should always be recreated or updated, as you have to update the sender_node, prev_node, next_node, and hop counter from source. Where:
The hop count could be counted from the ttl, if the ttl was fixed. This would limit the options, and thereby I suggest we keep the ttl and hop count separate.
I believe this would work for updating headers:
See above (better description of hop counter vs. seq number).
The node makes decisions by:
In this model, the route is defined by min hop count. The routes are built upon information gathered from the headers: neighbours, 2 hop neighbours and hop count to different destinations. You get, as a forwarder/listener, routes, not only, for the sources (by parsing the header of the messages), and for the destinations (by parsing the header of the ACKs), but also routes to your neighbors and neighbors 2-hops away (to both directions). Therefore the routing tables are established quickly as messaging starts (boosted during flooding as the node can parse all headers it "hears"). As described, the ACK (from destination_node), keeps the user happy, and helps building routes. I think sequence metrics are not as important. This is due to the fact, that by this method, the source gets the ACK if message passes, and "listeners" can update their routes. Therefore the routes are kept fresh. Some randomness, as the delivery with the same hop count can be done in many ways, for routes can be actually good, to not to overload the only route?
This is a good idea. In my opinion, all the settings/values regarding the protocol should be changeable by the user. This way the protocol could be used within many different applications. Off course, "best" default values should be in place.
In this scenario, ACKs deliver great value, as they're carrying information of a) routing information to nodes (hops to destination, neighbours and 2-hop information), and b) users can confirm the message has been delivered. There could be a setting to enable or disable automatic ACK from LL2?
All messages are beacons. Therefore, a GPS update packet would keep the routes alive. If routes get stale, the message is flooded (to learn the routes again). Could this route_delete_delay also be changeable by the user? As I tend to think there might be quite static networks, and even users wanting to have some of their nodes in a passive (listen only) mode, for saving battery. There could also be a beacon_interval setting for time (and change of position)?
By zones, I tried to describe the differences formed within a real world wireless mesh: nearby nodes tend to hear lots of neighboring traffic (nodes within their range), and thereby build more accurate routing tables of their neighbors and their neighbors, whereas two nodes many hops away, tend to build good routing tables slower (as no messages might not been sent from one end of the mesh to another one). This causes a situation, especially in the beginning or during fast changes of the mesh, where nearby packets are routed, whereas packets to nodes far away are flooded (as no routes have been established yet). This behavior is ok and expected. |
Very interesting. Thanks for clarifying many of your points. It actually sounds very similar to a dynamic source routing protocol, which was also suggested by @geeksville here sudomesh/LoRaLayer2#10 (comment) and here #52 (comment). I'm going to think more about this DSR idea, i think it may be a good one. Also, this would give me a reason to revisit the simulator and sync it up with all the changes to LL2. I'll keep y'all posted if I make any progress toward a DSR fork. |
You might find these findings interesting (broadcasting, routing, zero-control-packet, and reactive & pro-active protocols in mesh networking):
Something similar could be done within LL2?
Why only links refering to GoTenna? Because there have not so far been any other well known options.. |
Cool stuff, too bad their code isn't open source. I started working on a more DSR-like fork of the LL2 protocol here, https://github.com/sudomesh/LoRaLayer2/tree/DSR My thinking, and what I believe @cyclomies is suggesting, is to compress the routing table into the LL2 packet header. The new packet struct looks like this,
The datagram struct remains the same, but is now defined in the LL2 code,
To initialize the routing process, nodes must begin by transmitting packets with the broadcast address as the destination and receiver. Upon receiving a broadcast packet, neighbors inspect the header to see who was the Once nodes have neighbors, they can begin transmitting packets with a neighbor address as both destination and receiver. Neighbor nodes then listen to all packets that are transmitted. Upon receiving they inspect the packet for three items,
Then after sending packets to all neighbors, a node can begin attempting to sending packets with a two hop destination via the neighbor. This will then allow multi-hop routes two start developing metrics based on the metrics of each hop. The internal routing table management is almost exactly the same. which means that it is still very compatible with the original table sharing protocol. Really the only difference is that instead of sending routing tables in a single, large packet, routing table entries are sent one at a time inside of the header of every packet. This should work well. Though, I will still provide the option to enable control messages that share the entire routing table, such that a network could converge faster at the expense of airtime and increased noise. My next step is to port LL2 back to the simulator to test the validity of this protocol. And to write up full documentation of this modification on the wiki. |
Totally agree, and empathizes the value of this project!
Correct. Sorry for my unclear text.
I would suggest to add previous_sender (one sender before the last sender). This way a node could, not only get the last sender, but also the sender before the last sender. As such, the information of the header would let the node build a routing table (and hop counts..) of its neighbors, its neighbors neighbors, and routes to sources.
Destination address could be the node we are looking for?
..and if we had previous_sender (sender before the last sender), we could add 2 hop destinations to the table.
..and to nodes two hops away, if previous_sender would be added in the header, and node databases would be updated corresponding.
I suggest the nodes would use all packets they overhear for building the node database, but process only messages they are the intended sender or destination.
Correct. And if adding the previous_sender, we would learn the routes within our neighborhood (1-2 hop away) fast, as we overhear nodes transmitting around us.
And allow the node to route packets to nodes, that it has been heard as sources (as it knows the starting path and the hop count). If, sorry for repeating, we had previous_sender information received, we could route already to nodes three hops away. This would expand the the coverage for routing a lot, without a too large header.
This would be a good option, especially used on static nodes and when the network has been quiet (otherwise, we might start to be unsure of our old routes?). I do believe this could be left unused, if the intended destination nodes would send ACKs of recieved packets. This would, not only let the sender know the packet was received, but also update node databases for packets coming from the direction of the last destination (=learn routes with fewer hops to the destinations). Furthermore, previous_node information could be used to determine shorter routes to nodes: if we overhear a packet sent from previous_node and hear the packet of the sender, we could correct our route for the source node to go straight through previous_node. By adding this, we could always, when overhearing a packet, update our routing table for source, sender, and previous_sender. If ACKs where integrated, we would also learn the routes towards the destinations (as we would soon hear the ACK). In a small 6-10 node max 4 hop network, we could learn the whole mesh maybe with 1-2 broadcast messages and 1-3 unicast messages (including the ACKs), as all packets reveals routes to three nodes.
Good luck! |
I think my issue with adding a previous_sender address is that it is another address of uncertain quality (the receiver address also being uncertain until you actually hear from it) and does not contain useful information until you have a five node (four hop) wide network. In a four node (three hop) wide network, the previous_sender would always be either empty or the source. Could be a useful option still. I will look into testing it in the simulator.
This is currently how I've written the protocol. It always parses the header for routes, but it only retransmits the packet if the receiver address matches the nodes local address and it only passes the packet to Layer3 if it if the destination matches your nodes local address or the broadcast address. See the
Good point, I understand what you are suggesting. Right now we are working under the idea of manual ACKing by the user of a node. Maybe could make automatic ACKs with an optional feature. I definitely see the value of ACKing with this style of protocol. I will play around with these ideas in the simulator soon. |
Also if anyone is interested in playing with the this fork of the protocol, I've synced up the 1.0.0-rc.2 branch with the LL2 DSR branch. |
You're right on point. Previous_sender should help with larger mesh networks. There is, according to my understanding, one more benefit of that feature: if, and when, node A overhears several nodes (B-D) routing packets to B from source Z, A can recognize a shorter route to E trough D (previous_sender for packet forwarded by D), even if E was not the source. previous_sender should not left empty, but marked as source (hop counting logic for finding shortest route should be easier to do?). This could not only shorten hop count of one node to another one (A-->E), but also optimize the whole "zone of overhead transmits".
I'm interested to hear your verdict of those simulations on both prev_sender and ACKs. Option for manual and automated ACKs sounds great (automated for one to one messaging, and reducing traffic by manual ACK for one to all room messages?). A video/animation of the simulations could be fun to watch. |
@geeksville did you have a chance to check out the DSR fork? Might it meet your needs? |
@paidforby, did you find time for those simulations? |
Not yet, I became incredibly distracted with updating the the simulator to make it more usable (namely introducing another websocket server at Layer 3 so I can send messages using the real web app). I should be able to test in another day or two. |
@samuk Alas I have not, I'll definitely check it out in the next couple of weeks. I still might end up doing my own DSR, based on cribbing from radiohead's version and RFC 4728 (because it is super mature and well studied). The radiohead guy seems to have just taken that RFC and only implemented the fourish minimum requirements. I'd love to share an implementation with ya'll but I also kinda love protocol buffers for this sort of stuff. If curious my current work queue is here. |
Sounds interesting, I think I understand this concept, let me restate to see. So we add eight numbers, and two +/- symbols to give latitude/ longitude to one decimal place. This gives us 6.9mile squares covering the earth. We also add one (b)order flag to each packet. So this would cost 11 characters per packet? When I add my first node in a network I either manually enter the lat/long, or the GPS does it. For this theoretical network, it is 35.1, -121.5 Each additional node brought into the network listens for 100 packets and uses the most frequent Lat/long it hears as it's 'main' network (eg 35.1, -121.5) If it hears additional lat/long then it also acts as a 'border node'. Routing within the 35.1, -121.5 square happens in the reactive way outlined by Grant above. If a 'normal node' receives a message originating outside of its main lat/long square (eg 35.1,-121.4) it just ignores/drops it. If a Border node receives a message originating in 35.1,-121.4 that has an unset (b)order flag it edits the packet to 35.1,-121.5 and sets the (b) flag before forwarding. Each packet should therefore only cross a geographic border once? Writing this I'm still not sure I understand it. Is that kind of what you meant? |
@cyclomies to clarify on the cold boot scenario. Since this is reactive routing there is no automated process to build routes (though I suppose one could be implemented). A user of a node must begin by broadcasting a message. Neighboring nodes hear the broadcast, add the sender as a route and then could then begin sending unicast messages to that sender. Once the original sender hears a broadcast or unicast message and adds routes, it can then also begin transmitting unicast messages. By saying "sending messages to it's neighbors immeadiately" I mean broadcasting messages. Broadcast messages are defined as messages addressed to all neighbors (1 hop destinations).
You are correct, there is no need to send the "via node" info in the route table packet, in fact I was already doing it in the most efficient way possible, see https://github.com/sudomesh/disaster-radio/wiki/Protocol#routing-table-packets. Thanks for pointing that out, I would have wasted a lot of time, trying to rethink something that was already as minimal as possible.
I suppose this is a matter of perspective. |
You've got the right idea, but let me clarify:
I think a more efficient system might be a bitwise geocode. For example, alternating E/W-N/S bits for increasing precision. Maybe the first bit is basically 0 for the west half of the globe, and 1 for the east half. Then the next bit is 0 for the North half and 1 for the South half. Then the next bits are 0 /1 for the West/East half of the west half of the globe, and the same for the N/S half of the North half of the globe, and so on. Using this method my napkin math tells me we should be able to get a precision of less than half a mile with 16 bits each, so a single int. Additionally, we don't WANT or NEED that level of precision, so either we can use fewer bits and pack them in, or always use an int and mask or shift it. Regarding the border node tag, see below.
I think this (barring the caveat above about the format of the address) is a great idea, for new nodes to potentially "learn" where they are based on surrounding nodes if they have not already been given a location. Other than that, yeah. Any node that handles a packet with a matching geocode just routes it using the mesh routing tables as discussed above.
In the system I was imagining, packets might have to cross a geographic border any number of times, and therefore I think some additional logic is necessary, but I see where you are coming from with regard to the changing of the border flag. I'm thinking no border flag is necessary on the packet because we have the geocode already in the address. Here's an example: Nodes 1, 2, 3, and 4 are all in the same submesh (0000) so they all route between each other like normal. We need to find a way for the routing table to include nodes outside of the submesh, but ONLY if they are directly linked to nodes inside the submesh. Then, if the destination is in the submesh, find the next best submesh hop like normal, but if it's outside the submesh, find the next best submesh and THEN find the next best hop to the "intermediate" node in the table that is in that submesh. I think choosing the right balance of submesh size (geocode precision) to routing table size would at least put a sort of limit on the total table size to the number of submesh nodes PLUS the number of border nodes in other submeshes. For low power mesh networks, this should be pretty low. I would think having a submesh size of something like a few miles might mean that most local comms within a town would be routed normally and georouting would be rare. I also suggested a Maidenhead style system of alternating lat/long with increasing precision because I have also wondered if there is a way to use a dynamic mask to allow submeshes to be "resized" in areas where that might be beneficial, since each node is handling this on a packet-by-packet basis instead of maintaining a full topology map outside of its submesh. After all, if there are only 5 nodes in a city, georouting is probably more of a detriment than a benefit, but if there are 5000, it might be a necessity. Alternately, if the submesh size is too small you might be able to detect somehow that you are running into the "dead end" problem and increase the size until you minimize that. Regarding routing: Now, node 3 does the same calculation. It has a packet destined for 1001, it is in 0000 so it looks for the best off-mesh link and finds 5. Normal mesh routing causes it to pick 2 or 4, let's say 2 has a better link quality so it chooses 2. Node 2 does the same thing and it has a direct link to 5 so it sends it to 5. Node 5 has the same problem as node 3: it has been handed a packet with a destination not in its own submesh, so it follows the same process and picks either Node 7 or C. Both have the same geo distance, so it probably picks by the next hop link quality for each of those possibilities. Let's say C. C hands it to A, and node A sees that this packet is destined for the submesh that matches its own, so it just looks it up in the table like normal (ignoring any geo addressing and skipping those calculations). |
@paidforby Understood, perfect. And if I understood correctly, broadcast messages will be re-broadcasted according to the TTL value the sender specify?
Well put! ;) To avoid collisions, there could be specific random delayed time slots for TX:
Slot 1 for ACK's of the intended receiver, as they will also end other nodes to continue broadcasting or routing to the intended recipient. Slot 2 to prioritize re-broadcasted packets, as they are extremely handy for node discovery and updating node databases. Therefore, they will work for optimize routing. Slot 3 for all routed packets and broadcasted packets from the originating node. In short: messages (unicast or broadcast), and packets intended for routing, would be sent with a 4-8s delay. Any packets intended for re-broadcast will be sent before them, as they will most likely help in discovering new nodes and optimize the routes. ACK's are always sent "immediately", as they will end unnecessary broadcasting and unnecessary routing, and therefore free up airtime. ACK's would only be sent in slot 1 for 1-to-1 messaging only. Other ACK's would be sent in slot 3. The ms used in slots are only examples, as I did not count the correct packet airtime or physical TX/RX delays. Those ms delays for slots should actually be specified with % (of max packet air time for 1 minute?), not hard coded, to work with different network settings, duty cycles and so on? |
About those TX slots I mentioned: At first, I incorrectly thought, that the packets (with a maximum character length) should be possible to send, before the next TX slot. This is off course not the case. Sorry for the inconvenience! Therefore, the TX timeslots could be as follows:
As such, a routed packet would be sent in slot 3, if there are no other traffic on the air at that time. Moreover, if a broadcast is sent, and four nodes (including the intended recipient) hear the packet, the intended recipient node could send the ACK packet (slot 1) before any other node would have the time to re-broadcast (slot 2) the message. And as the ACK is heard by other nodes, they update their nodeDB and do not re-broadcast the message. |
@paidforby are you interested in this supplemental geo approach suggested by @jacquesalbert ? Realise it's solving problems we don't yet have, but a globally scaleable network does appeal. |
I am interested in trying to implement it as optional feature. I need to think more deeply about how to actually integrate it into the LL2 library, not sure how soon i'll get to it. but I'll add it to my todo list. |
@paidforby I have been playing around with this idea a little bit with the help of the simulator and while I think that something like what I described above could be a very effective "best of both worlds" system, I have struggled a bit to fit all the pieces of the idea together in a clean way. However, I think that the most effective way to get some mileage out of it DOES actually solve a current problem with the mesh: convergence time, especially when the mesh is less than active. By implementing geohashed addresses (almost for free, data-wise, since you already need a fairly long unique address) you can implement simple directional georouting as a fallback for when the routing table does not yet contain the destination, or when the routing table has reached some maximum size. Basically, if a node needs to send a packet to something it doesn't have a route to, it just routes it to the node in its table that is the closest geographically to the destination. |
The S2 coordinate system might be useful for this geographic routing. Uses 64-bit coordinates which are the distance along a Hilbert curve superimposed on the globe, precision can be arbitrarily reduced, checking for containment of a point within a cell or adjacency of points are literally bitwise operations, etc. https://s2geometry.io/ |
@paidforby, Any major updates regarding LL2? |
Nothing much. I started a new j.o.b. so I've had less time to work on this. Most recent updates involved supporting dual lora modules (which will be on the new disaster radio dev boards). Next, I plan on updating/improving the simulator, but who knows when that will actually get done. I've mostly stopped revising the routing protocol because it requires too much time/energy. I encourage others to try implementing the geographic routing, which I still find very desirable. |
I noticed that Reticulum has had a first release: https://unsigned.io/projects/reticulum/ There's a discussion of porting to C: markqvist/Reticulum#2 but no code yet. Does anyone have the skills/interest to help with this? |
If I understand Reticulum correctly, it'll not fit LoRA radio (unlike what they claims), mainly because it requires a MTU of 500 bytes (which can not be done reliably on LoRAWAN since it implies multiple packet other a short time, if you want to respect the law that's a no-no). Also, I don't understand how they deal with exponential routing table issue (I think they don't deal with it in any way), so it'll require a lot of memory on the client to store the routes & Link. |
Recticulum seems to send announcements as floods. This does not scale very well. |
Reticulum is perfectly well suited to LoRa, but probably not LoRaWAN (which is a bit purposeless any way). It is correct that Reticulum has a fixed MTU of 500 bytes, which LoRa can accomodate just fine, and without any reliability impact. LoRaWAN probably can't - I haven't actually made any attempts at this, so that is just my guess. In no jurisdiction is "multiple packets over a short time" viewed as a "no-no". That is not what duty cycle means ;) Announcements are not floods in Reticulum. They are prioritised and forwarded in a way that scales quite well. Some more info: https://markqvist.github.io/Reticulum/manual/understanding.html#the-announce-mechanism-in-detail |
Thanks @markqvist what happens when the link table is full? "Any node that forwards the link request will store a link id in it’s link table , along with the amount of hops the packet had taken when received. " https://markqvist.github.io/Reticulum/manual/understanding.html#link-establishment-in-detail |
Ideally, that should not happen. Just one megabyte of memory can accommodate more than 22.000 active links, so in most cases, this will not be a problem. Inactive links are of course culled from the table automatically. |
You seem to be talking about LoRaWAN, which that table also seem to be describing. As I said before, I believe it's quite pointless to run Reticulum over LoRaWAN. LoRa != LoRaWAN LoRa is a PHY modulation scheme. LoRaWAN is a networking stack running on top of the LoRa PHY. Reticulum is another networking stack that can run on top of LoRa, amongst many other things. Maybe you are confusing the terms. Really, no offense intended here, but I don't really care much whether you "don't think Reticulum fits LoRa network". I have hundreds of LoRa devices deployed in the real world running Reticulum, and they seem to be doing fine, in spite of your thoughts on the matter. A lot of your statements in the above are false, in relation to LoRa. They are maybe true in regards to LoRaWAN. Edit: Either way, I think it is pointless for this thread to get filled with discussion on Reticulum, since it would not be of any help to the disaster.radio project before a C++ implementation is ready anyways, and that is still a while in the future. As of writing this, the only functional Reticulum release is the Python reference implementation. |
So ~4400 as an upper limit on 192Kb devices. |
~4400 if you can allocate 192KB to the link table, since other things will also need memory ;) So my guess is a 192KB device could probably handle around 1500 active links. But I think the link throughput and CPU cycles needed to actually pass traffic for 1500 active connections would be exhausted before the RAM runs out. |
No, it comes from here: https://lora-developers.semtech.com/library/tech-papers-and-guides/lora-and-lorawan/ and the table is about LoRa and not LoRaWAN. You can probably use your node the way you want (with low SF), not giving a thought about respecting the standards and rules (duty cycle for one) but doing so does not give confidence in your network, and it's bothering other users around (which might be using LoRaWAN, or not). As long as you're not spreading on all channels... Sure, you can have hundred of nodes that are close to each other and it'll work. I never had hundred of nodes but only few that are very far and the duty cycle constraint bothers me a lot more than the bandwidth constraint since sending data takes forever (and I'm talking about ~160 bytes here, not 500) and the node must stay awake when they transmit. All in all, I've followed the Reticulum protocol when it was first described and I think it's a good protocol by itself, yet, I feel it's too large to work on LoRa and you'll maybe prove me wrong (and it would be great!). One of the missing thing in the protocol, IMHO, is the possibility to pre-provision symmetric keys for group communication, in order to avoid transmitting all the session creation packets. After all, if you master the nodes on the network, you can start communicating with encryption (and a known, secret, symmetric key) without having to deal with setting up DH and that would save a lot of bandwidth. Another improvement I haven't seen in all the network mesh protocols, is to prepare for next packet wake up time to save battery. Many nodes have a very precise clock (GPS, oscillator, etc...) they could use to synchronize communication time, including transmission delay (thus allowing a better usage of the duty cycle limit) and compute the varying session's encryption keys from the elapsed time/wallclock time. That's some information that can be saved to transmit. |
@X-Ryl669, that table is specifically talking about modulation characteristics in the context of LoRaWAN networks. Maybe you should read the document again.
Who said I had a hundred nodes close to each other? Quit with the straw manning already dude ;)
Again with the straw man. That's a completely unfounded accusation, which I find just a wee bit offending ;) I sincerely hope you will take that back. I care a great deal about proper spectrum use, and I've designed and built commercial networks in both licensed and unlicensed spectrum in more or less every band from HF to microwave. To me it seems like you are not that knowledgeable about spectrum use regulation, outside of a rather specific area, which you then seem to believe you can apply to everything. Spectrum access regulation is pretty complex, and while that Semtech page you linked to is probably a fine reference for what you can and cannot do within the LoRaWAN standard, you can extract very little about actual regulation, ie. law, from that. Just so you know, there is much more spectrum available (both licensed and unlicensed) where LoRa can be used legally than just the standard channels specified by the LoRa Alliance. As I said before, I completely agree that using Reticulum on top of, or within the confines of the LoRaWAN standard is pointless at best. There are plenty of other cases where Reticulum over LoRa PHY works wonderfully though. Reticulum and LoRaWAN are very different protocols, that serve very different purposes. They just both happen to be able to use the LoRa PHY. I think you have a good point about how pre-provisioning of group keys are cumbersome at the moment. I can be done, but there is no easy-to-use API for it. I will look into that in the future. And by the way, the LoRa MTU is 255 bytes, not 242 as you wrote ;) |
I still don't know if Reticulum is actually a good fit for disaster.radio, since my knowledge of disaster.radio is way too limited. But for what it's worth, Reticulum can now, since commit cd8de6420155dc14f1b4743fc99d8c5bf68589cb, work with MTUs all the way down to 211 bytes. Bandwidth efficiency is negatively impacted by such a low MTU, but it does work. This is not "officially" supported, and should probably be considered rather experimental. The official default MTU is still 500 bytes, and running any other MTU than 500 will break intercommunication with networks that run the standard MTU. So while I would not recommend using a different MTU, it is now possible if you want to :) The change will be in beta 0.2.4, which I am going to release in a few days. |
@markqvist thanks for the update, we'd still need C I think before even exploring it. markqvist/Reticulum#2 |
If I understand your wiki completely, you're only dealing with 2 level of abstraction for the routing table.
As I understand it, it almost impossible to reach a node in a large network (let's say on the 120-th hop), because:
Typically, let's say you have this topology:
If node A wants to talk to node F, then it must know about it (this is impossible to solve by storing all possible nodes's MAC, BTW).
Let's say that it has learnt about it previously, and now wants to talk to it.
It sends a packet with DEST=F and in its routing table, it needs to go via NEXTHOP = B.
Similarly, B needs to have it in its routing table also (how could that possible scale if the number of node is very high ?).
Let's say B is Rainman and remembers about all possible nodes.
When B receives it, it'll have to make a choice between C and E for its nexthop.
The metric for C might be better because C is closer to B, even if there is an additional node D in the chain. So it chooses C. The conditions are bad, and the packet struggles to follow the chain that goes via D to reach E (or F) immediately.
After some time, A would resend the packet to F, and by the time, the metric in for F would increase so that B select E.
D finally wakes up and send the packet to E and finally F.
Then F can/will receive 2 packets (one coming from D=>E, another from E).
How can it replies ?
Now imagine that F moves closer to B, so that B could speak to it directly. Who is going to remove the old routing table value in C and D ? What it a packet is being sent to C for F and D is currently removing its entry for F in its routing table before C removed its own version ? In that case, the packet will be lost at D (when C sends it to D for F) and no feedback is sent to A about this packet loss.
This is going very bad on the global network consumption because, to deal with this kind of issues, A must resend the packet to ensure it'll arrive at its destination (wasting the global network resources) or any intermediary layer before F must answer to A somehow to tell it they've failed delivery (which could even be worse since a single packet could create a spread of packets back to the origin, if D tells A it failed, then C tells A it failed, and so on).
The issue described above is not a theory, it's the main routing issue we observe on internet. I'm not even speaking of bad route or packet loss here.
As I understand it, any approach based on storing a dictionary with all the possible participant will fail since it grows exponentially in memory and resources consumption (image the above example if all nodes could see F except A).
The text was updated successfully, but these errors were encountered: