Topics in Database Systems: Data Management in Peer-to-Peer Systems STRUCTURED OVERLAYS ADVANCED ISSUES (Week 4 - 9/11/2006) NOTES ================================================================== 1. Physical Network Awareness in CAN ------------------------------------------- For BATON. it is not clear which overlay nodes should be close to each other in the physical network, since there are adjacent, parent/child and left and right sibling nodes, all contacted in some steps of the join/leave/search procedure. METHOD 1: Using landmarks for topologically sensitive overlay construction CHORD Each node gets a new number (RID) based on an ordering of the landmarks based on their distance to it. Say, there are 5 landmarks, an RID for a node may be 14523 meaning the closest landmark is landmark 1, then 4, 5, 2 and 3. Now, instead of hashing the IP of a node, we hash its RID number using a locality-aware hash function. We expect an 1xxxx node to be closer (in the virtual CHORD cycle) to other 1xxxx nodes than say to 2xxxx nodes. BATON METHOD 2: Distance-Weighted Routing Metrics Use the neighbor with the smallest weighted distance wd, where wd is defined as wd = weight * distance to data + (1 - weight) * physical (RTT) distance CHORD: We need some form of replication, so that we can make a weighted decision amongst potential next step neighborhs We need more than one node per entry in the finger table, or we may modify the "closest preceding finger table" so that it takes RTT into account. BATON: METHOD 3: Overloading Coordinate Zones: Multiple (Physical) Nodes per Zone (Virtual Node) Remark: Clarify partition vs replication CHORD: Can be done, have a node share a virtual ID node, map a virtual ID node to a range of MAXPEER values [l, u] Each node that gets a hash key in [l, u] is mapped to the same virtual node. We update the tables, so that we maintain links to the closest in RTT times successors BATON: Can be done, we need to modify the join and leave procedures Assume MAXPEER actual nodes per virtual BATON tree node As is, join seeks for nodes with full left and right tables and less than two children Now each node is accepted to share a zone with some other nodes as long as < MAXPEERS share the zone Thus, at each step of join, we look whether a node has an empty position, if YES ok, else we keep searching We update the tables, so that we maintain links to the closest in RTT times neighbors 2. Load Balance --------------------- data and workload (including routing load) METHOD 1: More uniform partitioning (handles data load) CHORD: Not possible: assignment of data to ids is not flexible BATON: Parent splits data content with its new child If the parent has already one child, we may consider sharing the content among all three nodes METHOD 2: Cache and replication (handles workload) Not in all cases, instead of caching only at the requester, we could cache at the search path (this is called path replication in unstructured p2p systems) Path replication may be also made proactive, if we replicate an item at all its 2^i successor nodes CHORD: Works exactly as described in CAN, content may be replicated to the immediate successor(s) or to the successors in the finger table at distance id+2^i (i<>0) BATON: Cache may work exactly as described in CAN Replication is tougher, we may replicate at say the parent (?) adjacent node in the in-order traversal? METHOD 3: Overloading Coordinate Zones: Multiple (Physical) Nodes per Zone (Virtual Node) May help for workload load balance if there is replication METHOD 4: Upon leaving merge appropriate zones (for data load balance) CHORD: not possible BATON: not possible 3. Fault Tolerance --------------------- Both routing table and data redundancy METHOD 1: High dimensions (routing redundancy) CHORD: Not applicable BATON: Perhaps, we may move to multi-way trees METHOD 2: Multiple realities (both routing and data redundancy) CHORD: A node takes different positions in the overlay BATON: Not applicable METHOD 3: Multiple hash functions (both routing and data redundancy) CHORD: May work, in effect, this results in building multiple CHORDs BATON: Not applicable METHOD 4: Overloading coordinate zones (routing and data redundancy (if entries are replicated and not partitioned) CHORD: May work BATON: May work METHOD 5: Cache and replication (data redundancy) CHORD: May work BATON: May work