17 KiB
6.5840 Lab 5: Sharded Key/Value Service
Introduction
You can either do a final project based on your own ideas, or this lab.
In this lab you'll build a key/value storage system that "shards," or partitions, the keys over a set of Raft-replicated key/value server groups (shardgrps). A shard is a subset of the key/value pairs; for example, all the keys starting with "a" might be one shard, all the keys starting with "b" another, etc. The reason for sharding is performance. Each shardgrp handles puts and gets for just a few of the shards, and the shardgrps operate in parallel; thus total system throughput (puts and gets per unit time) increases in proportion to the number of shardgrps.
The sharded key/value service has the components shown in the lab diagram. Shardgrps (blue squares) store shards with keys: shardgrp 1 holds a shard storing key "a", and shardgrp 2 holds a shard storing key "b". Clients interact with the service through a clerk (green circle), which implements Get and Put methods. To find the shardgrp for a key passed to Put/Get, the clerk gets the configuration from the kvsrv (black square), which you implemented in Lab 2. The configuration describes the mapping from shards to shardgrps (e.g., shard 1 is served by shardgrp 3).
An administrator (i.e., the tester) uses another client, the controller (purple circle), to add/remove shardgrps from the cluster and update which shardgrp should serve a shard. The controller has one main method: ChangeConfigTo, which takes as argument a new configuration and changes the system from the current configuration to the new configuration; this involves moving shards to new shardgrps that are joining and moving shards away from shardgrps that are leaving. To do so the controller 1) makes RPCs (FreezeShard, InstallShard, and DeleteShard) to shardgrps, and 2) updates the configuration stored in kvsrv.
The reason for the controller is that a sharded storage system must be able to shift shards among shardgrps: for load balancing, or when shardgrps join and leave (new capacity, repair, retirement).
The main challenges in this lab will be ensuring linearizability of Get/Put operations while handling 1) changes in the assignment of shards to shardgrps, and 2) recovering from a controller that fails or is partitioned during ChangeConfigTo.
- If ChangeConfigTo fails while reconfiguring, some shards may be inaccessible if they have started but not completed moving from one shardgrp to another. The tester starts a new controller; your job is to ensure that the new one completes the reconfiguration that the previous controller started.
- ChangeConfigTo moves shards from one shardgrp to another. You must ensure that at most one shardgrp is serving requests for each shard at any one time, so that clients using old vs new shardgrp don't break linearizability.
This lab uses "configuration" to refer to the assignment of shards to shardgrps. This is not the same as Raft cluster membership changes; you don't have to implement Raft cluster membership changes.
A shardgrp server is a member of only a single shardgrp. The set of servers in a given shardgrp will never change.
Only RPC may be used for interaction among clients and servers (no shared Go variables or files).
- Part A: Implement a working shardctrler (store/retrieve configurations in kvsrv), the shardgrp (replicated with Raft rsm), and a shardgrp clerk. The shardctrler talks to shardgrp clerks to move shards.
- Part B: Modify shardctrler to handle failures and partitions during config changes.
- Part C: Allow concurrent controllers without interfering with each other.
- Part D: Extend your solution in any way you like (optional).
This lab's design is in the same general spirit as Flat Datacenter Storage, BigTable, Spanner, FAWN, Apache HBase, Rosebud, Spinnaker, and others (details differ).
Lab 5 will use your kvsrv from Lab 2, and your rsm and Raft from Lab 4. Lab 5 and Lab 4 must use the same rsm and Raft implementations.
You may use late hours for Part A only; you may not use late hours for Parts B–D.
Getting Started
Do a git pull to get the latest lab software.
We supply you with tests and skeleton code in src/shardkv1:
- shardctrler package:
shardctrler.gowith methods for the controller to change a configuration (ChangeConfigTo) and to get a configuration (Query) - shardgrp package: shardgrp clerk and server
- shardcfg package: for computing shard configurations
- client.go: shardkv clerk
To get up and running:
$ cd ~/6.5840
$ git pull
...
$ cd src/shardkv1
$ go test -v
=== RUN TestInitQuery5A
Test (5A): Init and Query ... (reliable network)...
shardkv_test.go:46: Static wrong null 0
...
Part A: Moving Shards (hard)
Your first job is to implement shardgrps and the InitConfig, Query, and ChangeConfigTo methods when there are no failures. The code for describing a configuration is in shardkv1/shardcfg. Each shardcfg.ShardConfig has a unique identifying number Num, a mapping from shard number to group number, and a mapping from group number to the list of servers replicating that group. There will usually be more shards than groups so that load can be shifted at a fairly fine granularity.
1. InitConfig and Query (no shardgrps yet)
Implement in shardctrler/shardctrler.go:
- Query: returns the current configuration; read it from kvsrv (stored there by InitConfig).
- InitConfig: receives the first configuration (a shardcfg.ShardConfig from the tester) and stores it in an instance of Lab 2's kvsrv.
Use ShardCtrler.IKVClerk Get/Put to talk to kvsrv, ShardConfig.String() to serialize for Put, and shardcfg.FromString() to deserialize. You're done when you pass the first test:
$ cd ~/6.5840/src/shardkv1
$ go test -run TestInitQuery5A
Test (5A): Init and Query ... (reliable network)...
... Passed -- time 0.0s #peers 1 #RPCs 3 #Ops 0
PASS
ok 6.5840/shardkv1 0.197s
2. Shardgrp and shardkv clerk (Static test)
Implement an initial version of shardgrp in shardkv1/shardgrp/server.go and a shardgrp clerk in shardkv1/shardgrp/client.go by copying from your Lab 4 kvraft solution. Implement the shardkv clerk in shardkv1/client.go that uses Query to find the shardgrp for a key, then talks to that shardgrp. You're done when you pass the Static test.
- Upon creation, the first shardgrp (shardcfg.Gid1) should initialize itself to own all shards.
- shardkv1/client.go's Put must return ErrMaybe when the reply was maybe lost; the inner (shardgrp) Put can signal this with an error.
- To put/get a key from a shardgrp, the shardkv clerk should create a shardgrp clerk via shardgrp.MakeClerk, passing the servers from the configuration and the shardkv clerk's ck.clnt. Use ShardConfig.GidServers() to get the group for a shard.
- Use shardcfg.Key2Shard() to find the shard number for a key. The tester passes a ShardCtrler to MakeClerk in shardkv1/client.go; use Query to get the current configuration.
- You can copy Put/Get and related code from kvraft client.go and server.go.
$ cd ~/6.5840/src/shardkv1
$ go test -run Static
Test (5A): one shard group ... (reliable network)...
... Passed -- time 5.4s #peers 1 #RPCs 793 #Ops 180
PASS
ok 6.5840/shardkv1 5.632s
3. ChangeConfigTo and shard movement (JoinBasic, DeleteBasic)
Support movement of shards among groups by implementing ChangeConfigTo: it changes from an old configuration to a new one. The new configuration may add new shardgrps or remove existing ones. The controller must move shard data so that each shardgrp's stored shards match the new configuration.
Suggested approach for moving a shard:
- Freeze the shard at the source shardgrp (that shardgrp rejects Puts for keys in the moving shard).
- Install (copy) the shard to the destination shardgrp.
- Delete the frozen shard at the source.
- Post the new configuration so clients can find the moved shard.
This avoids direct shardgrp-to-shardgrp interaction and allows serving shards not involved in the change.
Ordering: Each configuration has a unique Num (see shardcfg/shardcfg.go). In Part A the tester calls ChangeConfigTo sequentially; the new config has Num one larger than the previous. To reject stale RPCs, FreezeShard, InstallShard, and DeleteShard should include Num (see shardgrp/shardrpc/shardrpc.go), and shardgrps must remember the largest Num they have seen for each shard.
Implement ChangeConfigTo in shardctrler/shardctrler.go and extend shardgrp to support freeze, install, and delete. Implement FreezeShard, InstallShard, DeleteShard in shardgrp/client.go and shardgrp/server.go using the RPCs in shardgrp/shardrpc, and reject old RPCs based on Num. Modify the shardkv clerk in shardkv1/client.go to handle ErrWrongGroup (returned when the shardgrp is not responsible for the shard). Pass JoinBasic and DeleteBasic first (joining groups; leaving can come next).
- Run FreezeShard, InstallShard, DeleteShard through your rsm package, like Put and Get.
- If an RPC reply includes a map that is part of server state, you may get races; include a copy of the map in the reply.
- You can send an entire map in an RPC request/reply to keep shard transfer code simple.
- A shardgrp should return ErrWrongGroup for a Put/Get whose key's shard is not assigned to it; shardkv1/client.go should reread the configuration and retry.
4. Shardgrps that leave (TestJoinLeaveBasic5A)
Extend ChangeConfigTo to handle shardgrps that leave (in current config but not in the new one). Pass TestJoinLeaveBasic5A.
5. All Part A tests
Your solution must continue serving shards that are not affected by an ongoing configuration change. Pass all Part A tests:
$ cd ~/6.5840/src/shardkv1
$ go test -run 5A
Test (5A): Init and Query ... (reliable network)...
... Passed -- time 0.0s #peers 1 #RPCs 3 #Ops 0
Test (5A): one shard group ... (reliable network)...
... Passed -- time 5.1s #peers 1 #RPCs 792 #Ops 180
Test (5A): a group joins... (reliable network)...
... Passed -- time 12.9s #peers 1 #RPCs 6300 #Ops 180
...
Test (5A): many concurrent clerks unreliable... (unreliable network)...
... Passed -- time 25.3s #peers 1 #RPCs 7553 #Ops 1896
PASS
ok 6.5840/shardkv1 243.115s
Part B: Handling a Failed Controller (easy)
The controller is short-lived and may fail or lose connectivity while moving shards. The task is to recover when a new controller is started: the new controller must finish the reconfiguration that the previous one started. The tester calls InitController when starting a controller; you can implement recovery there.
Approach: Keep two configurations in the controller's kvsrv: current and next. When a controller starts a reconfiguration, it stores the next configuration. When it completes, it makes next the current. In InitController, check if there is a stored next configuration with a higher Num than current; if so, complete the shard moves to reconfigure to that next config.
A controller that continues from a failed one may repeat FreezeShard, InstallShard, Delete RPCs; shardgrps can use Num to detect duplicates and reject them.
Implement this in the shardctrler. You're done when you pass the Part B tests:
$ cd ~/6.5840/src/shardkv1
$ go test -run 5B
Test (5B): Join/leave while a shardgrp is down... (reliable network)...
... Passed -- time 9.2s #peers 1 #RPCs 899 #Ops 120
Test (5B): recover controller ... (reliable network)...
... Passed -- time 26.4s #peers 1 #RPCs 3724 #Ops 360
PASS
ok 6.5840/shardkv1 35.805s
- Implement recovery in InitController in shardctrler/shardctrler.go.
Part C: Concurrent Configuration Changes (moderate)
Modify the controller to allow concurrent controllers. When one crashes or is partitioned, the tester starts a new one, which must finish any in-progress work (as in Part B). So several controllers may run concurrently and send RPCs to shardgrps and to the kvsrv.
Challenge: Ensure controllers don't step on each other. In Part A you already fenced shardgrp RPCs with Num so old RPCs are rejected; duplicate work from multiple controllers is safe. The remaining issue is that only one controller should update the next configuration, so two controllers (e.g. partitioned and new) don't write different configs for the same Num. The tester runs several controllers concurrently; each reads the current config, updates it for a join/leave, then calls ChangeConfigTo—so multiple controllers may call ChangeConfigTo with different configs with the same Num. You can use version numbers and versioned Puts so that only one controller successfully posts the next config and others return without doing anything.
Modify the controller so that only one controller can post a next configuration for a given configuration Num. Pass the concurrent tests:
$ cd ~/6.5840/src/shardkv1
$ go test -run TestConcurrentReliable5C
Test (5C): Concurrent ctrlers ... (reliable network)...
... Passed -- time 8.2s #peers 1 #RPCs 1753 #Ops 120
PASS
ok 6.5840/shardkv1 8.364s
$ go test -run TestAcquireLockConcurrentUnreliable5C
Test (5C): Concurrent ctrlers ... (unreliable network)...
... Passed -- time 23.8s #peers 1 #RPCs 1850 #Ops 120
PASS
ok 6.5840/shardkv1 24.008s
- See concurCtrler in test.go for how the tester runs controllers concurrently.
Recovery + new controller: A new controller should still perform Part B recovery. If the old controller was partitioned during ChangeConfigTo, ensure the old one doesn't interfere with the new one. If all controller updates are properly fenced with Num (from Part B), you may not need extra code. Pass the Partition tests:
$ go test -run Partition
Test (5C): partition controller in join... (reliable network)...
... Passed -- time 7.8s #peers 1 #RPCs 876 #Ops 120
...
Test (5C): controllers with leased leadership ... (unreliable network)...
... Passed -- time 60.5s #peers 1 #RPCs 11422 #Ops 2336
PASS
ok 6.5840/shardkv1 217.779s
Rerun all tests to ensure recent controller changes didn't break earlier parts.
Gradescope will run Lab 3A–D, Lab 4A–C, and 5C tests. Before submitting:
$ go test ./raft1
$ go test ./kvraft1
$ go test ./shardkv1
Part D: Extend Your Solution
In this final part you may extend your solution in any way you like. You must write your own tests for your extensions.
Implement one of the ideas below or your own. Write a paragraph in extension.md describing your extension and upload extension.md to Gradescope. For harder, open-ended extensions, you may partner with another student.
Ideas (first few easier, later more open-ended):
- (hard) Modify shardkv to support transactions (several Puts and Gets atomically across shards). Implement two-phase commit and two-phase locking. Write tests.
- (hard) Support transactions in kvraft (several Put/Get atomically). Then versioned Puts are unnecessary. See etcd's transactions. Write tests.
- (hard) Let the kvraft leader serve Gets without going through rsm (optimization at end of Section 8 of the Raft paper, including leases), preserving linearizability. Pass existing kvraft tests. Add a test that optimized Gets are faster (e.g. fewer RPCs) and a test that term switches are slower (new leader waits for lease expiry).
- (moderate) Add a Range function to kvsrv (keys from low to high). Lazy: iterate the key/value map; better: data structure for range search (e.g. B-tree). Include a test that fails the lazy solution but passes the better one.
- (moderate) Change kvsrv to exactly-once Put/Get semantics (e.g. Lab 2 dropped-messages style). Implement exactly-once in kvraft as well. You may port tests from 2024.
- (easy) Change the tester to use kvraft instead of kvsrv for the controller (e.g. replace kvsrv.StartKVServer in MakeTestMaxRaft in test.go with kvraft.StartKVServer). Write a test that the controller can query/update configuration while one kvraft peer is down. Tester code lives in src/kvtest1, src/shardkv1, src/tester1.
Handin Procedure
Before submitting, run all tests one final time:
$ go test ./raft1
$ go test ./kvraft1
$ go test ./shardkv1