add docs
This commit is contained in:
@@ -0,0 +1,258 @@
|
||||
# 6.5840 Lab 5: Sharded Key/Value Service
|
||||
|
||||
## Introduction
|
||||
|
||||
You can either do a final project based on your own ideas, or this lab.
|
||||
|
||||
In this lab you'll build a key/value storage system that **"shards,"** or partitions, the keys over a set of Raft-replicated key/value server groups (**shardgrps**). A **shard** is a subset of the key/value pairs; for example, all the keys starting with "a" might be one shard, all the keys starting with "b" another, etc. The reason for sharding is **performance**. Each shardgrp handles puts and gets for just a few of the shards, and the shardgrps operate in parallel; thus total system throughput (puts and gets per unit time) increases in proportion to the number of shardgrps.
|
||||
|
||||
The sharded key/value service has the components shown in the lab diagram. **Shardgrps** (blue squares) store shards with keys: shardgrp 1 holds a shard storing key "a", and shardgrp 2 holds a shard storing key "b". Clients interact with the service through a **clerk** (green circle), which implements **Get** and **Put** methods. To find the shardgrp for a key passed to Put/Get, the clerk gets the **configuration** from the **kvsrv** (black square), which you implemented in **Lab 2**. The configuration describes the mapping from shards to shardgrps (e.g., shard 1 is served by shardgrp 3).
|
||||
|
||||
An administrator (i.e., the tester) uses another client, the **controller** (purple circle), to add/remove shardgrps from the cluster and update which shardgrp should serve a shard. The controller has one main method: **ChangeConfigTo**, which takes as argument a new configuration and changes the system from the current configuration to the new configuration; this involves moving shards to new shardgrps that are joining and moving shards away from shardgrps that are leaving. To do so the controller 1) makes RPCs (**FreezeShard**, **InstallShard**, and **DeleteShard**) to shardgrps, and 2) updates the configuration stored in kvsrv.
|
||||
|
||||
The reason for the controller is that a sharded storage system must be able to **shift shards among shardgrps**: for load balancing, or when shardgrps join and leave (new capacity, repair, retirement).
|
||||
|
||||
The main challenges in this lab will be ensuring **linearizability** of Get/Put operations while handling 1) changes in the assignment of shards to shardgrps, and 2) recovering from a controller that fails or is partitioned during **ChangeConfigTo**.
|
||||
|
||||
1. **If ChangeConfigTo fails while reconfiguring**, some shards may be inaccessible if they have started but not completed moving from one shardgrp to another. The tester starts a new controller; your job is to ensure that the new one completes the reconfiguration that the previous controller started.
|
||||
2. **ChangeConfigTo moves shards** from one shardgrp to another. You must ensure that **at most one shardgrp is serving requests for each shard at any one time**, so that clients using old vs new shardgrp don't break linearizability.
|
||||
|
||||
This lab uses "configuration" to refer to the **assignment of shards to shardgrps**. This is **not** the same as Raft cluster membership changes; you don't have to implement Raft cluster membership changes.
|
||||
|
||||
A shardgrp server is a member of only a single shardgrp. The set of servers in a given shardgrp will never change.
|
||||
|
||||
**Only RPC** may be used for interaction among clients and servers (no shared Go variables or files).
|
||||
|
||||
* **Part A**: Implement a working **shardctrler** (store/retrieve configurations in kvsrv), the **shardgrp** (replicated with Raft rsm), and a **shardgrp clerk**. The shardctrler talks to shardgrp clerks to move shards.
|
||||
* **Part B**: Modify shardctrler to handle failures and partitions during config changes.
|
||||
* **Part C**: Allow concurrent controllers without interfering with each other.
|
||||
* **Part D**: Extend your solution in any way you like (optional).
|
||||
|
||||
This lab's design is in the same general spirit as Flat Datacenter Storage, BigTable, Spanner, FAWN, Apache HBase, Rosebud, Spinnaker, and others (details differ).
|
||||
|
||||
Lab 5 will use your **kvsrv from Lab 2**, and your **rsm and Raft from Lab 4**. Lab 5 and Lab 4 must use the same rsm and Raft implementations.
|
||||
|
||||
You may use late hours for **Part A** only; you may **not** use late hours for Parts B–D.
|
||||
|
||||
---
|
||||
|
||||
## Getting Started
|
||||
|
||||
Do a `git pull` to get the latest lab software.
|
||||
|
||||
We supply you with tests and skeleton code in **src/shardkv1**:
|
||||
|
||||
* **shardctrler** package: `shardctrler.go` with methods for the controller to change a configuration (**ChangeConfigTo**) and to get a configuration (**Query**)
|
||||
* **shardgrp** package: shardgrp clerk and server
|
||||
* **shardcfg** package: for computing shard configurations
|
||||
* **client.go**: shardkv clerk
|
||||
|
||||
To get up and running:
|
||||
|
||||
```bash
|
||||
$ cd ~/6.5840
|
||||
$ git pull
|
||||
...
|
||||
$ cd src/shardkv1
|
||||
$ go test -v
|
||||
=== RUN TestInitQuery5A
|
||||
Test (5A): Init and Query ... (reliable network)...
|
||||
shardkv_test.go:46: Static wrong null 0
|
||||
...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part A: Moving Shards (hard)
|
||||
|
||||
Your first job is to implement shardgrps and the **InitConfig**, **Query**, and **ChangeConfigTo** methods when there are no failures. The code for describing a configuration is in **shardkv1/shardcfg**. Each **shardcfg.ShardConfig** has a unique identifying number **Num**, a mapping from shard number to group number, and a mapping from group number to the list of servers replicating that group. There will usually be more shards than groups so that load can be shifted at a fairly fine granularity.
|
||||
|
||||
### 1. InitConfig and Query (no shardgrps yet)
|
||||
|
||||
Implement in **shardctrler/shardctrler.go**:
|
||||
|
||||
* **Query**: returns the current configuration; read it from kvsrv (stored there by InitConfig).
|
||||
* **InitConfig**: receives the first configuration (a **shardcfg.ShardConfig** from the tester) and stores it in an instance of Lab 2's **kvsrv**.
|
||||
|
||||
Use **ShardCtrler.IKVClerk** Get/Put to talk to kvsrv, **ShardConfig.String()** to serialize for Put, and **shardcfg.FromString()** to deserialize. You're done when you pass the first test:
|
||||
|
||||
```bash
|
||||
$ cd ~/6.5840/src/shardkv1
|
||||
$ go test -run TestInitQuery5A
|
||||
Test (5A): Init and Query ... (reliable network)...
|
||||
... Passed -- time 0.0s #peers 1 #RPCs 3 #Ops 0
|
||||
PASS
|
||||
ok 6.5840/shardkv1 0.197s
|
||||
```
|
||||
|
||||
### 2. Shardgrp and shardkv clerk (Static test)
|
||||
|
||||
Implement an initial version of **shardgrp** in **shardkv1/shardgrp/server.go** and a **shardgrp clerk** in **shardkv1/shardgrp/client.go** by copying from your Lab 4 kvraft solution. Implement the **shardkv clerk** in **shardkv1/client.go** that uses **Query** to find the shardgrp for a key, then talks to that shardgrp. You're done when you pass the **Static** test.
|
||||
|
||||
* Upon creation, the first shardgrp (**shardcfg.Gid1**) should initialize itself to **own all shards**.
|
||||
* **shardkv1/client.go**'s Put must return **ErrMaybe** when the reply was maybe lost; the inner (shardgrp) Put can signal this with an error.
|
||||
* To put/get a key from a shardgrp, the shardkv clerk should create a shardgrp clerk via **shardgrp.MakeClerk**, passing the servers from the configuration and the shardkv clerk's **ck.clnt**. Use **ShardConfig.GidServers()** to get the group for a shard.
|
||||
* Use **shardcfg.Key2Shard()** to find the shard number for a key. The tester passes a **ShardCtrler** to **MakeClerk** in **shardkv1/client.go**; use **Query** to get the current configuration.
|
||||
* You can copy Put/Get and related code from kvraft **client.go** and **server.go**.
|
||||
|
||||
```bash
|
||||
$ cd ~/6.5840/src/shardkv1
|
||||
$ go test -run Static
|
||||
Test (5A): one shard group ... (reliable network)...
|
||||
... Passed -- time 5.4s #peers 1 #RPCs 793 #Ops 180
|
||||
PASS
|
||||
ok 6.5840/shardkv1 5.632s
|
||||
```
|
||||
|
||||
### 3. ChangeConfigTo and shard movement (JoinBasic, DeleteBasic)
|
||||
|
||||
Support **movement of shards among groups** by implementing **ChangeConfigTo**: it changes from an old configuration to a new one. The new configuration may add new shardgrps or remove existing ones. The controller must move shard **data** so that each shardgrp's stored shards match the new configuration.
|
||||
|
||||
**Suggested approach for moving a shard:**
|
||||
|
||||
1. **Freeze** the shard at the source shardgrp (that shardgrp rejects Puts for keys in the moving shard).
|
||||
2. **Install** (copy) the shard to the destination shardgrp.
|
||||
3. **Delete** the frozen shard at the source.
|
||||
4. **Post** the new configuration so clients can find the moved shard.
|
||||
|
||||
This avoids direct shardgrp-to-shardgrp interaction and allows serving shards not involved in the change.
|
||||
|
||||
**Ordering:** Each configuration has a unique **Num** (see **shardcfg/shardcfg.go**). In Part A the tester calls ChangeConfigTo sequentially; the new config has **Num** one larger than the previous. To reject stale RPCs, **FreezeShard**, **InstallShard**, and **DeleteShard** should include **Num** (see **shardgrp/shardrpc/shardrpc.go**), and shardgrps must remember the **largest Num** they have seen for each shard.
|
||||
|
||||
Implement **ChangeConfigTo** in **shardctrler/shardctrler.go** and extend shardgrp to support **freeze**, **install**, and **delete**. Implement **FreezeShard**, **InstallShard**, **DeleteShard** in **shardgrp/client.go** and **shardgrp/server.go** using the RPCs in **shardgrp/shardrpc**, and reject old RPCs based on Num. Modify the shardkv clerk in **shardkv1/client.go** to handle **ErrWrongGroup** (returned when the shardgrp is not responsible for the shard). Pass **JoinBasic** and **DeleteBasic** first (joining groups; leaving can come next).
|
||||
|
||||
* Run **FreezeShard**, **InstallShard**, **DeleteShard** through your **rsm** package, like Put and Get.
|
||||
* If an RPC reply includes a **map** that is part of server state, you may get races; **include a copy** of the map in the reply.
|
||||
* You can send an entire map in an RPC request/reply to keep shard transfer code simple.
|
||||
* A shardgrp should return **ErrWrongGroup** for a Put/Get whose key's shard is not assigned to it; **shardkv1/client.go** should reread the configuration and retry.
|
||||
|
||||
### 4. Shardgrps that leave (TestJoinLeaveBasic5A)
|
||||
|
||||
Extend **ChangeConfigTo** to handle shardgrps that **leave** (in current config but not in the new one). Pass **TestJoinLeaveBasic5A**.
|
||||
|
||||
### 5. All Part A tests
|
||||
|
||||
Your solution must **continue serving shards that are not affected** by an ongoing configuration change. Pass all Part A tests:
|
||||
|
||||
```bash
|
||||
$ cd ~/6.5840/src/shardkv1
|
||||
$ go test -run 5A
|
||||
Test (5A): Init and Query ... (reliable network)...
|
||||
... Passed -- time 0.0s #peers 1 #RPCs 3 #Ops 0
|
||||
Test (5A): one shard group ... (reliable network)...
|
||||
... Passed -- time 5.1s #peers 1 #RPCs 792 #Ops 180
|
||||
Test (5A): a group joins... (reliable network)...
|
||||
... Passed -- time 12.9s #peers 1 #RPCs 6300 #Ops 180
|
||||
...
|
||||
Test (5A): many concurrent clerks unreliable... (unreliable network)...
|
||||
... Passed -- time 25.3s #peers 1 #RPCs 7553 #Ops 1896
|
||||
PASS
|
||||
ok 6.5840/shardkv1 243.115s
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part B: Handling a Failed Controller (easy)
|
||||
|
||||
The controller is short-lived and may **fail or lose connectivity** while moving shards. The task is to **recover** when a new controller is started: the new controller must **finish the reconfiguration** that the previous one started. The tester calls **InitController** when starting a controller; you can implement recovery there.
|
||||
|
||||
**Approach:** Keep **two configurations** in the controller's kvsrv: **current** and **next**. When a controller starts a reconfiguration, it stores the next configuration. When it completes, it makes next the current. In **InitController**, check if there is a stored **next** configuration with a higher Num than current; if so, **complete the shard moves** to reconfigure to that next config.
|
||||
|
||||
A controller that continues from a failed one may **repeat** FreezeShard, InstallShard, Delete RPCs; shardgrps can use **Num** to detect duplicates and reject them.
|
||||
|
||||
Implement this in the shardctrler. You're done when you pass the Part B tests:
|
||||
|
||||
```bash
|
||||
$ cd ~/6.5840/src/shardkv1
|
||||
$ go test -run 5B
|
||||
Test (5B): Join/leave while a shardgrp is down... (reliable network)...
|
||||
... Passed -- time 9.2s #peers 1 #RPCs 899 #Ops 120
|
||||
Test (5B): recover controller ... (reliable network)...
|
||||
... Passed -- time 26.4s #peers 1 #RPCs 3724 #Ops 360
|
||||
PASS
|
||||
ok 6.5840/shardkv1 35.805s
|
||||
```
|
||||
|
||||
* Implement recovery in **InitController** in **shardctrler/shardctrler.go**.
|
||||
|
||||
---
|
||||
|
||||
## Part C: Concurrent Configuration Changes (moderate)
|
||||
|
||||
Modify the controller to allow **concurrent controllers**. When one crashes or is partitioned, the tester starts a new one, which must finish any in-progress work (as in Part B). So several controllers may run concurrently and send RPCs to shardgrps and to the kvsrv.
|
||||
|
||||
**Challenge:** Ensure controllers **don't step on each other**. In Part A you already fenced shardgrp RPCs with **Num** so old RPCs are rejected; duplicate work from multiple controllers is safe. The remaining issue is that **only one controller** should update the **next** configuration, so two controllers (e.g. partitioned and new) don't write different configs for the same Num. The tester runs several controllers concurrently; each reads the current config, updates it for a join/leave, then calls ChangeConfigTo—so multiple controllers may call ChangeConfigTo with **different configs with the same Num**. You can use **version numbers and versioned Puts** so that only one controller successfully posts the next config and others return without doing anything.
|
||||
|
||||
Modify the controller so that **only one controller can post a next configuration for a given configuration Num**. Pass the concurrent tests:
|
||||
|
||||
```bash
|
||||
$ cd ~/6.5840/src/shardkv1
|
||||
$ go test -run TestConcurrentReliable5C
|
||||
Test (5C): Concurrent ctrlers ... (reliable network)...
|
||||
... Passed -- time 8.2s #peers 1 #RPCs 1753 #Ops 120
|
||||
PASS
|
||||
ok 6.5840/shardkv1 8.364s
|
||||
|
||||
$ go test -run TestAcquireLockConcurrentUnreliable5C
|
||||
Test (5C): Concurrent ctrlers ... (unreliable network)...
|
||||
... Passed -- time 23.8s #peers 1 #RPCs 1850 #Ops 120
|
||||
PASS
|
||||
ok 6.5840/shardkv1 24.008s
|
||||
```
|
||||
|
||||
* See **concurCtrler** in **test.go** for how the tester runs controllers concurrently.
|
||||
|
||||
**Recovery + new controller:** A new controller should still perform Part B recovery. If the old controller was partitioned during ChangeConfigTo, ensure the old one doesn't interfere with the new one. If all controller updates are properly fenced with Num (from Part B), you may not need extra code. Pass the **Partition** tests:
|
||||
|
||||
```bash
|
||||
$ go test -run Partition
|
||||
Test (5C): partition controller in join... (reliable network)...
|
||||
... Passed -- time 7.8s #peers 1 #RPCs 876 #Ops 120
|
||||
...
|
||||
Test (5C): controllers with leased leadership ... (unreliable network)...
|
||||
... Passed -- time 60.5s #peers 1 #RPCs 11422 #Ops 2336
|
||||
PASS
|
||||
ok 6.5840/shardkv1 217.779s
|
||||
```
|
||||
|
||||
Rerun all tests to ensure recent controller changes didn't break earlier parts.
|
||||
|
||||
**Gradescope** will run Lab 3A–D, Lab 4A–C, and 5C tests. Before submitting:
|
||||
|
||||
```bash
|
||||
$ go test ./raft1
|
||||
$ go test ./kvraft1
|
||||
$ go test ./shardkv1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part D: Extend Your Solution
|
||||
|
||||
In this final part you may **extend your solution** in any way you like. You must **write your own tests** for your extensions.
|
||||
|
||||
Implement one of the ideas below or your own. Write a **paragraph in extension.md** describing your extension and upload **extension.md** to Gradescope. For harder, open-ended extensions, you may partner with another student.
|
||||
|
||||
**Ideas (first few easier, later more open-ended):**
|
||||
|
||||
* **(hard)** Modify shardkv to support **transactions** (several Puts and Gets atomically across shards). Implement two-phase commit and two-phase locking. Write tests.
|
||||
* **(hard)** Support **transactions in kvraft** (several Put/Get atomically). Then versioned Puts are unnecessary. See [etcd's transactions](https://etcd.io/docs/v3.4/learning/api/). Write tests.
|
||||
* **(hard)** Let the kvraft **leader serve Gets without going through rsm** (optimization at end of Section 8 of the Raft paper, including **leases**), preserving linearizability. Pass existing kvraft tests. Add a test that optimized Gets are faster (e.g. fewer RPCs) and a test that term switches are slower (new leader waits for lease expiry).
|
||||
* **(moderate)** Add a **Range** function to kvsrv (keys from low to high). Lazy: iterate the key/value map; better: data structure for range search (e.g. B-tree). Include a test that fails the lazy solution but passes the better one.
|
||||
* **(moderate)** Change kvsrv to **exactly-once** Put/Get semantics (e.g. Lab 2 dropped-messages style). Implement exactly-once in kvraft as well. You may port tests from 2024.
|
||||
* **(easy)** Change the tester to use **kvraft** instead of kvsrv for the controller (e.g. replace kvsrv.StartKVServer in MakeTestMaxRaft in test.go with kvraft.StartKVServer). Write a test that the controller can query/update configuration while one kvraft peer is down. Tester code lives in **src/kvtest1**, **src/shardkv1**, **src/tester1**.
|
||||
|
||||
---
|
||||
|
||||
## Handin Procedure
|
||||
|
||||
Before submitting, run all tests one final time:
|
||||
|
||||
```bash
|
||||
$ go test ./raft1
|
||||
$ go test ./kvraft1
|
||||
$ go test ./shardkv1
|
||||
```
|
||||
|
||||
---
|
||||
*From: [6.5840 Lab 5: Sharded Key/Value Service](https://pdos.csail.mit.edu/6.824/labs/lab-shard1.html)*
|
||||
Reference in New Issue
Block a user