# Proposal: Use aos-dev/go-storage to replace storage.ExternalStorage ## Background [dumping] uses `storage.ExternalFileWriter` to support data export. `storage.ExternalFileWriter` use following APIs: ```golang type ExternalFileWriter interface { // Write writes to buffer and if chunk is filled will upload it Write(ctx context.Context, p []byte) (int, error) // Close writes final chunk and completes the upload Close(ctx context.Context) error } ``` In order to support multipart uploads, `storage.ExternalStorage` will create a struct to carry upload_id and completed parts: ```golang type S3Uploader struct { svc s3iface.S3API createOutput *s3.CreateMultipartUploadOutput completeParts []*s3.CompletedPart } ``` `S3Uploader` will create new parts in every call of `Write` and complete parts in `Close`. Based on these design, [dumping]'s main data export logic is following: ```golang func WriteInsert(pCtx *tcontext.Context, cfg *Config, meta TableMeta, tblIR TableDataIR, w storage.ExternalFileWriter) (n uint64, err error) { ... wp := newWriterPipe(w, cfg.FileSize, cfg.StatementSize, cfg.Labels) ... for fileRowIter.HasNext() { ... for fileRowIter.HasNext() { lastBfSize := bf.Len() if selectedField != "" { if err = fileRowIter.Decode(row); err != nil { pCtx.L().Error("fail to scan from sql.Row", zap.Error(err)) return counter, errors.Trace(err) } row.WriteToBuffer(bf, escapeBackslash) } else { bf.WriteString("()") } counter++ wp.AddFileSize(uint64(bf.Len()-lastBfSize) + 2) // 2 is for ",\n" and ";\n" ... fileRowIter.Next() shouldSwitch := wp.ShouldSwitchStatement() if fileRowIter.HasNext() && !shouldSwitch { bf.WriteString(",\n") } else { bf.WriteString(";\n") } if bf.Len() >= lengthLimit { select { case <-pCtx.Done(): return counter, pCtx.Err() case err = <-wp.errCh: return counter, err case wp.input <- bf: bf = pool.Get().(*bytes.Buffer) if bfCap := bf.Cap(); bfCap < lengthLimit { bf.Grow(lengthLimit - bfCap) } AddCounter(finishedRowsCounter, cfg.Labels, float64(counter-lastCounter)) lastCounter = counter } } if shouldSwitch { break } } if wp.ShouldSwitchFile() { break } } ... if bf.Len() > 0 { wp.input <- bf } close(wp.input) <-wp.closed ... return counter, wp.Error() } ``` [dumping] will create a buffer and call `ExternalFileWriter.Write` every time the buffer has been written 1048576(1M) lines. ## Propose It's indeed a burden for applications to connect to all storage services, especially for an application that has complicated business logic. So I propose to use [aos-dev/go-storage] to replace storage.ExternalStorage. [aos-dev/go-storage] is an application-oriented unified storage layer for Golang. It's design goals are **Production ready**, **High performance** and **Vendor agnostic**. go-storage will support as many services as possible, including S3, GCS, OSS, COS, Kodo(qiniu), QingStor, even Dropbox(contributed via community). ### Benefits - go-storage is maintained by a dedicated team who focused on storage areas, licensed under [Apache-2.0](https://github.com/aos-dev/go-storage/blob/master/LICENSE). - go-storage supports 10 storage services and could be more in the future. - go-storage has all services tested via CI: https://github.com/aos-dev/go-service-s3/actions/workflows/intergration-test.yml - go-storage is a general storage layer designed for different workloads, so there are no limitations when it comes to [dumpling] business expansion. ### Drawbacks - go-storage needs to support all features that [dumping] supports for now, as described in issue [go-service-s3#51](https://github.com/aos-dev/go-service-s3/issues/51), such as SSE. - [dumping] needs to handle the config parse to construct go-storage's Storager. ## Implementations For the first stage, we can just replace the `Write` and `Close` call without touching other parts of the projects. - Change the config parse to support construct go-storage's Storager - Way A: Use go-storage's [Multiparter](https://github.com/aos-dev/go-storage/blob/master/types/operation.generated.go) to replace `storage.ExternalFileWriter`. - Way B: Use go-storage to implement `storage.ExternalFileWriter` ## Rational ### `io.FS` `io.FS` has been included in std lib since go 1.16. But `io.FS` is designed to work with file instead of bytes or stream. And is's lack of object storage's Multipart Object support. ### `spf13/afero` [afero](https://github.com/spf13/afero) is another FileSystem Abstraction System for Go. As his name implies, it also works with files. There is no official support for s3 like services, but there is community built one: [afero-s3](https://github.com/fclairamb/afero-s3/). It uses `S3Manager` to in `Write` operations which means user can't control the logic of underlying multipart object. --- [dumping]: https://github.com/pingcap/dumpling [aos-dev/go-storage]: https://github.com/aos-dev/go-storage