14 Commits

Author SHA1 Message Date
147659b0da test and fix bugs found by cline
All checks were successful
build container / build-container (push) Successful in 28m57s
run go test / test (push) Successful in 25m34s
2025-06-10 14:22:43 +08:00
2a0bd28958 use cline and prepare to test the project for further development
Some checks failed
build container / build-container (push) Has been cancelled
2025-06-10 14:14:32 +08:00
835045346d add header checker
Some checks failed
build container / build-container (push) Successful in 33m30s
run go test / test (push) Failing after 3s
2025-04-01 14:09:43 +08:00
85968bb5cf update matcher to path matcher 2025-04-01 11:12:34 +08:00
e14bcb205b rename handlerlog to handlerctx
All checks were successful
build container / build-container (push) Successful in 6m4s
run go test / test (push) Successful in 3m23s
2025-03-03 09:59:19 +08:00
83dfcba4ae move x-accel into local storage configuration
All checks were successful
build container / build-container (push) Successful in 5m45s
run go test / test (push) Successful in 3m15s
2025-03-03 09:15:37 +08:00
504a809c16 add pypi to config
Some checks failed
build container / build-container (push) Failing after 21s
2025-03-03 09:07:26 +08:00
f9ff71f62a add pprof handler
Some checks failed
build container / build-container (push) Failing after 22s
run go test / test (push) Failing after 19s
2025-03-03 00:26:29 +08:00
629c095bbe update golang sdk to 1.24
All checks were successful
build container / build-container (push) Successful in 6m19s
2025-02-25 09:56:30 +08:00
9b34fd29c0 fix goproxy on gitea action
All checks were successful
build container / build-container (push) Successful in 5m21s
2025-02-23 12:13:39 +08:00
3cfa6c6116 improve container build
Some checks failed
build container / build-container (push) Failing after 4m43s
2025-02-23 12:07:05 +08:00
27634d016b reset tcp connection if error happens
All checks were successful
build container / build-container (push) Successful in 4m33s
run go test / test (push) Successful in 3m56s
2025-02-22 23:27:41 +08:00
ab852a6520 make it work across instances
All checks were successful
run go test / test (push) Successful in 12m39s
build container / build-container (push) Successful in 8m3s
2025-02-19 21:30:49 +08:00
18bdc4f54e configure caches
Some checks are pending
build container / build-container (push) Waiting to run
run go test / test (push) Waiting to run
2025-02-19 20:50:21 +08:00
21 changed files with 1867 additions and 107 deletions

121
.clinerules Normal file
View File

@ -0,0 +1,121 @@
# Project
## User Language
Simplified Chinese
# Cline's Memory Bank
I am Cline, an expert software engineer with a unique characteristic: my memory resets completely between sessions. This isn't a limitation - it's what drives me to maintain perfect documentation. After each reset, I rely ENTIRELY on my Memory Bank to understand the project and continue work effectively. I MUST read ALL memory bank files at the start of EVERY task - this is not optional.
## Memory Bank Structure
The Memory Bank consists of core files and optional context files, all in Markdown format. Files build upon each other in a clear hierarchy:
flowchart TD
PB[projectbrief.md] --> PC[productContext.md]
PB --> SP[systemPatterns.md]
PB --> TC[techContext.md]
PC --> AC[activeContext.md]
SP --> AC
TC --> AC
AC --> P[progress.md]
### Core Files (Required)
1. `projectbrief.md`
- Foundation document that shapes all other files
- Created at project start if it doesn't exist
- Defines core requirements and goals
- Source of truth for project scope
2. `productContext.md`
- Why this project exists
- Problems it solves
- How it should work
- User experience goals
3. `activeContext.md`
- Current work focus
- Recent changes
- Next steps
- Active decisions and considerations
- Important patterns and preferences
- Learnings and project insights
4. `systemPatterns.md`
- System architecture
- Key technical decisions
- Design patterns in use
- Component relationships
- Critical implementation paths
5. `techContext.md`
- Technologies used
- Development setup
- Technical constraints
- Dependencies
- Tool usage patterns
6. `progress.md`
- What works
- What's left to build
- Current status
- Known issues
- Evolution of project decisions
### Additional Context
Create additional files/folders within memory-bank/ when they help organize:
- Complex feature documentation
- Integration specifications
- API documentation
- Testing strategies
- Deployment procedures
## Core Workflows
### Plan Mode
flowchart TD
Start[Start] --> ReadFiles[Read Memory Bank]
ReadFiles --> CheckFiles{Files Complete?}
CheckFiles -->|No| Plan[Create Plan]
Plan --> Document[Document in Chat]
CheckFiles -->|Yes| Verify[Verify Context]
Verify --> Strategy[Develop Strategy]
Strategy --> Present[Present Approach]
### Act Mode
flowchart TD
Start[Start] --> Context[Check Memory Bank]
Context --> Update[Update Documentation]
Update --> Execute[Execute Task]
Execute --> Document[Document Changes]
## Documentation Updates
Memory Bank updates occur when:
1. Discovering new project patterns
2. After implementing significant changes
3. When user requests with **update memory bank** (MUST review ALL files)
4. When context needs clarification
flowchart TD
Start[Update Process]
subgraph Process
P1[Review ALL Files]
P2[Document Current State]
P3[Clarify Next Steps]
P4[Document Insights & Patterns]
P1 --> P2 --> P3 --> P4
end
Start --> Process
Note: When triggered by **update memory bank**, I MUST review every memory bank file, even if some don't require updates. Focus particularly on activeContext.md and progress.md as they track current state.
REMEMBER: After every memory reset, I begin completely fresh. The Memory Bank is my only link to previous work. It must be maintained with precision and clarity, as my effectiveness depends entirely on its accuracy.

View File

@ -3,22 +3,25 @@ name: build container
run-name: build container on ${{ gitea.actor }}
on: [push]
env:
GOPROXY: ${{ vars.GOPROXY }}
jobs:
build-container:
runs-on: bookworm
runs-on: docker
steps:
- name: Setup apt mirror
run: sed -i "s,deb.debian.org,${{ vars.DEBIAN_MIRROR }},g" ${{ vars.DEBIAN_APT_SOURCES }}
if: ${{ vars.DEBIAN_MIRROR && vars.DEBIAN_APT_SOURCES }}
- name: Setup debian environment
run: apt update && apt install -y podman podman-compose nodejs
- name: Setup cache for podman
uses: actions/cache@v4
with:
path: |
/var/lib/containers
key: podman-storage
- name: Check out repository code
uses: actions/checkout@v4
- name: Build container
run: REGISTRY=${{ github.server_url}}; REGISTRY=${REGISTRY##https://}; REGISTRY=${REGISTRY##http://}; podman build -t $REGISTRY/${{ github.repository }} .
run: REGISTRY=${{ github.server_url}}; REGISTRY=${REGISTRY##https://}; REGISTRY=${REGISTRY##http://}; podman build --build-arg GOPROXY=${{ vars.GOPROXY }} -t $REGISTRY/${{ github.repository }} .
- name: Login to Container Registry
run: echo "${{ secrets.ACTION_PACKAGE_WRITE_TOKEN }}" | podman login ${{ github.server_url }} -u ${{ github.repository_owner }} --password-stdin
- name: Push Container Image

View File

@ -7,22 +7,33 @@ on:
- go.mod
- go.lock
- go.work
- .gitea/workflows/go-test.yaml
env:
GOPROXY: ${{ vars.GOPROXY }}
jobs:
test:
runs-on: bookworm
runs-on: docker
steps:
- name: Setup apt mirror
run: sed -i "s,deb.debian.org,${{ vars.DEBIAN_MIRROR }},g" ${{ vars.DEBIAN_APT_SOURCES }}
if: ${{ vars.DEBIAN_MIRROR && vars.DEBIAN_APT_SOURCES }}
- name: Setup Debian environment
run: apt update && apt install -y nodejs
- name: Setup cache for golang toolchain
uses: actions/cache@v4
with:
path: |
/opt/hostedtoolcache/go
/root/go/pkg/mod
/root/.cache/go-build
key: ${{ runner.os }}-golang
- name: Setup Golang
uses: actions/setup-go@v5
with:
go-version: 1.24.0
cache-dependency-path: |
go.sum
- name: Check out repository code
uses: actions/checkout@v4
- name: Run tests

5
.gitignore vendored
View File

@ -1,4 +1,7 @@
.vscode
data
__debug*
__debug*
.env
.envrc

View File

@ -1,10 +1,12 @@
FROM docker.io/library/golang:1.23-alpine
ARG GOPROXY
FROM docker.io/library/golang:1.24-alpine
ARG GOPROXY=direct
ARG TAGS=""
ARG ALPINE_MIRROR=https://mirrors.ustc.edu.cn
WORKDIR /src
COPY go.mod go.sum ./
RUN apk add git && env GOPROXY=${GOPROXY:-direct} go mod download
RUN sed -i "s,https://dl-cdn.alpinelinux.org,${ALPINE_MIRROR}," /etc/apk/repositories; apk add git && env GOPROXY=${GOPROXY:-direct} go mod download
ADD . /src
RUN go build -o /cache-proxy ./cmd/proxy
RUN go build -o /cache-proxy -tags "${TAGS}" ./cmd/proxy
FROM docker.io/library/alpine:3.21 AS runtime
COPY --from=0 /cache-proxy /bin/cache-proxy

17
cmd/proxy/debug.go Normal file
View File

@ -0,0 +1,17 @@
//go:build pprof
package main
import (
"flag"
"os"
)
func init() {
flag.BoolVar(&pprofEnabled, "pprof", true, "")
if v, ok := os.LookupEnv("SENTRY_DSN"); ok {
sentrydsn = v
}
flag.StringVar(&sentrydsn, "sentry", sentrydsn, "sentry dsn to report errors")
}

View File

@ -4,13 +4,14 @@ import (
"flag"
"log/slog"
"net/http"
"net/http/pprof"
"os"
"path/filepath"
"time"
cacheproxy "git.jeffthecoder.xyz/guochao/cache-proxy"
"git.jeffthecoder.xyz/guochao/cache-proxy/pkgs/middleware"
"git.jeffthecoder.xyz/guochao/cache-proxy/pkgs/middleware/handlerlog"
"git.jeffthecoder.xyz/guochao/cache-proxy/pkgs/middleware/handlerctx"
"git.jeffthecoder.xyz/guochao/cache-proxy/pkgs/middleware/httplog"
"git.jeffthecoder.xyz/guochao/cache-proxy/pkgs/middleware/recover"
@ -22,6 +23,7 @@ var (
configFilePath = "config.yaml"
logLevel = "info"
sentrydsn = ""
pprofEnabled = false
)
func init() {
@ -31,12 +33,8 @@ func init() {
if v, ok := os.LookupEnv("LOG_LEVEL"); ok {
logLevel = v
}
if v, ok := os.LookupEnv("SENTRY_DSN"); ok {
sentrydsn = v
}
flag.StringVar(&configFilePath, "config", configFilePath, "path to config file")
flag.StringVar(&logLevel, "log-level", logLevel, "log level. (trace, debug, info, warn, error)")
flag.StringVar(&sentrydsn, "sentry", sentrydsn, "sentry dsn to report errors")
}
func configFromFile(path string) (*cacheproxy.Config, error) {
@ -57,9 +55,9 @@ func configFromFile(path string) (*cacheproxy.Config, error) {
Local: &cacheproxy.LocalStorage{
Path: "./data",
TemporaryFilePattern: "temp.*",
},
Accel: cacheproxy.Accel{
ResponseWithHeaders: []string{"X-Sendfile", "X-Accel-Redirect"},
Accel: cacheproxy.Accel{
RespondWithHeaders: []string{"X-Sendfile", "X-Accel-Redirect"},
},
},
},
Misc: cacheproxy.MiscConfig{
@ -124,10 +122,18 @@ func main() {
mux.HandleFunc("GET /", server.HandleRequestWithCache)
if pprofEnabled {
mux.HandleFunc("GET /debug/pprof/", pprof.Index)
mux.HandleFunc("GET /debug/pprof/cmdline", pprof.Cmdline)
mux.HandleFunc("GET /debug/pprof/profile", pprof.Profile)
mux.HandleFunc("GET /debug/pprof/symbol", pprof.Symbol)
mux.HandleFunc("GET /debug/pprof/trace", pprof.Trace)
}
slog.With("addr", ":8881").Info("serving app")
if err := http.ListenAndServe(":8881", middleware.Use(mux,
recover.Recover(),
handlerlog.FindHandler(mux),
handlerctx.FindHandler(mux),
httplog.Log(httplog.Config{LogStart: true, LogFinish: true}),
)); err != nil {
slog.With("error", err).Error("failed to start server")

20
compose.release.yaml Normal file
View File

@ -0,0 +1,20 @@
services:
cache-proxy:
image: ${REPOSITORY:-git.jeffthecoder.xyz}/${OWNER:-public}/cache-proxy:${VERSION:-latest}
build:
args:
GOPROXY: ${GOPROXY:-direct}
ALPINE_MIRROR: ${ALPINE_MIRROR:-https://mirrors.ustc.edu.cn}
restart: always
volumes:
- ./config.yaml:/config.yaml:ro
- ./data:/data
ports:
- 8881:8881
environment:
- LOG_LEVEL=info
env_file:
- path: .env
required: false

View File

@ -1,11 +1,12 @@
services:
cache-proxy:
image: 100.64.0.2:3000/guochao/cache-proxy
image: ${REPOSITORY:-git.jeffthecoder.xyz}/${OWNER:-public}/cache-proxy:${VERSION:-latest}
build:
args:
GOPROXY: ${GOPROXY:-direct}
ALPINE_MIRROR: ${ALPINE_MIRROR:-https://mirrors.ustc.edu.cn}
TAGS: pprof
pull_policy: always
restart: always
volumes:
@ -14,8 +15,8 @@ services:
ports:
- 8881:8881
environment:
- HTTPS_PROXY=http://100.64.0.23:7890
- HTTP_PROXY=http://100.64.0.23:7890
- ALL_PROXY=http://100.64.0.23:7890
- SENTRY_DSN=http://3b7336602c77427d96318074cd86ee04@100.64.0.2:8000/2
- SENTRY_DSN=${SENTRY_DSN}
- LOG_LEVEL=debug
env_file:
- path: .env
required: false

View File

@ -10,43 +10,55 @@ type UpstreamPriorityGroup struct {
Priority int `yaml:"priority"`
}
type UpstreamMatch struct {
type PathTransformation struct {
Match string `yaml:"match"`
Replace string `yaml:"replace"`
}
type HeaderChecker struct {
Name string `yaml:"name"`
Match *string `yaml:"match"`
}
type Checker struct {
StatusCodes []int `yaml:"status-codes"`
Headers []HeaderChecker `yaml:"headers"`
}
type Upstream struct {
Server string `yaml:"server"`
Match UpstreamMatch `yaml:"match"`
Path PathTransformation `yaml:"path"`
Checkers []Checker `yaml:"checkers"`
AllowedRedirect *string `yaml:"allowed-redirect"`
PriorityGroups []UpstreamPriorityGroup `yaml:"priority-groups"`
}
func (upstream Upstream) GetPath(orig string) (string, bool, error) {
if upstream.Match.Match == "" || upstream.Match.Replace == "" {
if upstream.Path.Match == "" || upstream.Path.Replace == "" {
return orig, true, nil
}
matcher, err := regexp.Compile(upstream.Match.Match)
matcher, err := regexp.Compile(upstream.Path.Match)
if err != nil {
return "", false, err
}
return matcher.ReplaceAllString(orig, upstream.Match.Replace), matcher.MatchString(orig), nil
return matcher.ReplaceAllString(orig, upstream.Path.Replace), matcher.MatchString(orig), nil
}
type LocalStorage struct {
Path string `yaml:"path"`
TemporaryFilePattern string `yaml:"temporary-file-pattern"`
Accel Accel `yaml:"accel"`
}
type Accel struct {
EnableByHeader string `yaml:"enable-by-header"`
ResponseWithHeaders []string `yaml:"response-with-headers"`
EnableByHeader string `yaml:"enable-by-header"`
RespondWithHeaders []string `yaml:"respond-with-headers"`
}
type Storage struct {
Type string `yaml:"type"`
Local *LocalStorage `yaml:"local"`
Accel Accel `yaml:"accel"`
}
type CachePolicyOnPath struct {

View File

@ -1,108 +1,120 @@
upstream:
- server: https://mirrors.aliyun.com
match:
match: /(debian|ubuntu|ubuntu-releases|alpine|archlinux|kali|manjaro|msys2|almalinux|rocky|centos|centos-stream|centos-vault|fedora|epel|gnu)/(.*)
path:
match: /(debian|ubuntu|ubuntu-ports|ubuntu-releases|alpine|archlinux|kali|manjaro|msys2|almalinux|rocky|centos|centos-stream|centos-vault|fedora|epel|gnu|pypi)/(.*)
replace: '/$1/$2'
- server: https://mirrors.tencent.com
match:
match: /(debian|ubuntu|ubuntu-releases|alpine|archlinux|kali|manjaro|msys2|almalinux|rocky|centos|centos-stream|centos-vault|fedora|epel|gnu)/(.*)
path:
match: /(debian|ubuntu|ubuntu-ports|ubuntu-releases|alpine|archlinux|kali|manjaro|msys2|almalinux|rocky|centos|centos-stream|centos-vault|fedora|epel|gnu)/(.*)
replace: '/$1/$2'
- server: https://mirrors.ustc.edu.cn
match:
match: /(debian|ubuntu|ubuntu-releases|alpine|archlinux|kali|manjaro|msys2|almalinux|rocky|centos|centos-stream|centos-vault|fedora|epel|elrepo|remi|rpmfusion|tailscale|gnu|rust-static)/(.*)
path:
match: /(debian|ubuntu|ubuntu-releases|alpine|archlinux|kali|manjaro|msys2|almalinux|rocky|centos|centos-stream|centos-vault|fedora|epel|elrepo|remi|rpmfusion|tailscale|gnu|rust-static|pypi)/(.*)
replace: '/$1/$2'
checkers:
headers:
- name: content-type
match: application/octet-stream
- server: https://packages.microsoft.com/repos/code
match:
path:
match: /microsoft-code(.*)
replace: '$1'
- server: https://archive.ubuntu.com
match:
path:
match: /ubuntu/(.*)
replace: /$1
- server: https://ports.ubuntu.com
path:
match: /ubuntu-ports/(.*)
replace: /$1
# priority-groups:
# - match: /dists/.*
# priority: 10
- server: https://releases.ubuntu.com/
match:
path:
match: /ubuntu-releases/(.*)
replace: /$1
- server: https://apt.releases.hashicorp.com
match:
path:
match: /hashicorp-deb/(.*)
replace: /$1
- server: https://rpm.releases.hashicorp.com
match:
path:
match: /hashicorp-rpm/(.*)
replace: /$1
- server: https://dl.google.com/linux/chrome
match:
path:
match: /google-chrome/(.*)
replace: /$1
- server: https://pkgs.tailscale.com/stable
match:
path:
match: /tailscale/(.*)
replace: /$1
- server: http://update.cs2c.com.cn:8080
match:
path:
match: /kylinlinux/(.*)
replace: /$1
- server: https://cdn.jsdelivr.net
match:
path:
match: /jsdelivr/(.*)
replace: /$1
- server: https://dl-cdn.alpinelinux.org
match:
path:
match: /alpine/(.*)
replace: /$1
- server: https://repo.almalinux.org
match:
path:
match: /almalinux/(.*)
replace: /almalinux/$1
- server: https://dl.rockylinux.org/pub/rocky
match:
path:
match: /rocky/(.*)
replace: /$1
- server: https://dl.fedoraproject.org/pub/epel
match:
path:
match: /epel/(.*)
replace: /$1
- server: https://deb.debian.org
match:
path:
match: /(debian|debian-security)/(.*)
replace: /$1/$2
# priority-groups:
# - match: /dists/.*
# priority: 10
- server: https://static.rust-lang.org
match:
path:
match: /rust-static/(.*)
replace: /$1
- server: https://storage.flutter-io.cn
match:
path:
match: /flutter-storage/(.*)
replace: /$1
- server: https://cdn-mirror.chaotic.cx/chaotic-aur
match:
path:
match: /chaotic-aur/(.*)
replace: /$1
allowed-redirect: "^https?://cf-builds.garudalinux.org/.*/chaotic-aur/.*$"
- server: http://download.zfsonlinux.org
path:
match: /openzfs/(.*)
replace: /$1
misc:
first-chunk-bytes: 31457280 # 1024*1024*30, all upstreams with 200 response will wait for the first chunks to select a upstream by bandwidth, instead of by latency. defaults to 50M
# chunk-bytes: 1048576 # 1024*1024
first-chunk-bytes: 31457280 ## 1024*1024*30, all upstreams with 200 response will wait for the first chunks to select a upstream by bandwidth, instead of by latency. defaults to 50M
# chunk-bytes: 1048576 ## 1024*1024
cache:
timeout: 1h
policies:
- match: '.*\.(rpm|deb|apk|iso)$' # rpm/deb/apk/iso won't change, create only
- match: '.*\.(rpm|deb|apk|iso)$' ## rpm/deb/apk/iso won't change, create only
refresh-after: never
- match: '^/.*-security/.*' # mostly ubuntu/debian security
- match: '^/.*-security/.*' ## mostly ubuntu/debian security
refresh-after: 1h
- match: '^/archlinux-localaur/.*' # archlinux local repo
- match: '^/archlinux-localaur/.*' ## archlinux local repo
refresh-after: never
- match: '^/(archlinux.*|chaotic-aur)/*.tar.*' # archlinux packages
- match: '^/(archlinux.*|chaotic-aur)/*.tar.*' ## archlinux packages
refresh-after: never
- match: '/chaotic-aur/.*\.db$' # archlinux chaotic-aur database
- match: '/chaotic-aur/.*\.db$' ## archlinux chaotic-aur database
refresh-after: 24h
- match: '/centos/7'
refresh-after: never
@ -110,20 +122,23 @@ cache:
refresh-after: never
storage:
type: local # ignored
type: local ## ignored for now
local:
path: ./data # defaults to ./data
path: ./data ## defaults to ./data
accel:
# example nginx config:
## location /i {
## internal;
## alias /path/to/data;
## }
## location / {
## proxy_pass 127.0.0.1:8881;
## proxy_set_header X-Accel-Path /i;
## }
##
accel:
## example nginx config:
## location /i {
## internal;
## alias /path/to/data;
## }
## location / {
## proxy_pass 127.0.0.1:8881;
## proxy_set_header X-Accel-Path /i;
## }
##
## then cache proxy will respond to backend a header to indicate sendfile(join(X-Accel-Path, relpath))
# enable-by-header: "X-Accel-Path"
# enable-by-header: "X-Accel-Path"
## respond with different headers to
# respond-with-headers: [X-Sendfile, X-Accel-Redirect]

View File

@ -0,0 +1,28 @@
# 当前工作重点
当前的工作重点是根据 `progress.md` 中的待办事项,开始对项目进行优化和功能增强。在完成了全面的测试覆盖后,我们对现有代码的稳定性和正确性有了很强的信心,可以安全地进行重构。
## 近期变更
- **完成 `server_test.go`**:
- 补全了 `server_test.go` 中所有待办的测试用例,包括对 `X-Accel-Redirect` 和路径穿越攻击的测试。
- 对所有测试用例的注释进行了审查和修正,确保注释与代码的实际行为保持一致。
- 所有测试均已通过,为后续的开发和重构工作奠定了坚实的基础。
- **更新 `progress.md`**:
- 将“增加更全面的单元测试和集成测试”标记为已完成。
## 后续步骤
1. **代码重构**:
- 根据 `progress.md` 的待办事项,首先考虑对 `server.go` 中的复杂函数(如 `streamOnline`)进行重构,以提高代码的可读性和可维护性。
2. **功能增强**:
- 在代码结构优化后,可以开始考虑实现新的功能,例如增加对 S3 等新存储后端的支持,或实现更复杂的负载均衡策略。
3. **持续文档更新**:
- 在进行重构或添加新功能时,同步更新 `systemPatterns.md` 和其他相关文档,以记录新的设计决策。
## 重要模式与偏好
- **代码风格**: 遵循 Go 社区的最佳实践,代码清晰、简洁,并有适当的注释。
- **并发编程**: 倾向于使用 Go 的原生并发原语goroutine 和 channel来解决并发问题。
- **配置驱动**: 核心逻辑应与配置分离,使得项目可以通过修改配置文件来适应不同的使用场景,而无需修改代码。
- **文档驱动**: 所有重要的设计决策、架构变更和功能实现都应在 Memory Bank 中有相应的记录。

View File

@ -0,0 +1,26 @@
# 产品背景
在许多场景下,用户需要从软件镜像站、文件服务器或其他资源站点下载文件。然而,由于网络波动、服务器负载或地理位置等原因,单个服务器的下载速度和稳定性可能无法得到保证。
## 解决的问题
`cache-proxy` 旨在解决以下问题:
1. **单点故障**: 当用户依赖的唯一镜像服务器宕机或无法访问时,下载会中断。
2. **网络速度不佳**: 用户可能没有连接到最快的可用服务器,导致下载时间过长。
3. **重复下载**: 多个用户或同一个用户多次下载相同的文件时,会产生不必要的网络流量,增加了服务器的负载。
## 工作原理
`cache-proxy` 通过在用户和多个上游服务器之间引入一个代理层来解决这些问题。
- 当收到一个文件请求时,它会并发地向所有配置的上游服务器请求该文件。
- 它会选择第一个返回成功响应的上游服务器,并将数据流式传输给用户。
- 同时,它会将文件缓存到本地磁盘。
- 对于后续的相同文件请求,如果缓存未过期,它将直接从本地提供文件,极大地提高了响应速度并减少了网络流量。
## 用户体验目标
- **无缝集成**: 用户只需将下载地址指向 `cache-proxy` 即可,无需关心后端的复杂性。
- **更快的下载**: 通过竞争机制选择最快的源,用户总能获得最佳的下载体验。
- **更高的可用性**: 即使部分上游服务器出现问题,只要有一个可用,服务就不会中断。

40
memory-bank/progress.md Normal file
View File

@ -0,0 +1,40 @@
# 项目进展
这是一个已完成的 `cache-proxy` 项目的初始状态。核心功能已经实现并可以工作。
## 已完成的功能
- **核心代理逻辑**:
-`config.yaml` 加载配置。
- 启动 HTTP 服务器并监听请求。
- 根据请求路径检查本地缓存。
- **并发上游请求**:
- 能够并发地向上游服务器发起请求。
- 能够正确地选择最快响应的服务器。
- 能够在选择一个服务器后取消其他请求。
- **缓存管理**:
- 能够将下载的文件缓存到本地磁盘。
- 支持基于时间的缓存刷新策略。
- 支持通过 `If-Modified-Since` 请求头来减少不必要的数据传输。
- **并发请求处理**:
- 能够正确处理对同一文件的多个并发请求,确保只下载一次。
- **加速下载**:
- 支持通过 `X-Sendfile` / `X-Accel-Redirect` 头将文件发送委托给前端服务器(如 Nginx
- **全面的测试覆盖**:
- 完成了 `server_test.go` 的实现,为所有核心功能提供了单元测试和集成测试。
- 测试覆盖了正常流程、边缘情况(如超时、上游失败)和安全(如路径穿越)等方面。
- 对测试代码和注释进行了审查,确保其准确性和一致性。
- 所有测试均已通过,验证了现有代码的健壮性。
## 待办事项
- **功能增强**:
- 目前只支持本地文件存储,未来可以考虑增加对其他存储后端(如 S3、Redis的支持。
- 增加更复杂的负载均衡策略,而不仅仅是“选择最快”。
- 增加更详细的监控和指标(如 Prometheus metrics
- **代码优化**:
-`server.go` 中的一些复杂函数(如 `streamOnline`)进行重构,以提高可读性和可维护性。
## 已知问题
- 在当前阶段,尚未发现明显的 bug。代码结构清晰逻辑完整。

View File

@ -0,0 +1,10 @@
# 项目简介Cache-Proxy
这是一个使用 Go 语言编写的高性能缓存代理服务器。其核心功能是接收客户端的请求并将其转发到多个配置好的上游Upstream镜像服务器。它会并发地向上游服务器发起请求并选择最快返回响应的服务器将其结果返回给客户端同时将结果缓存到本地以便后续请求能够更快地得到响应。
## 核心需求
- **性能**: 必须能够快速地处理并发请求,并有效地利用缓存。
- **可靠性**: 当某个上游服务器不可用时,能够自动切换到其他可用的服务器。
- **可配置性**: 用户可以通过配置文件轻松地添加、删除和配置上游服务器、缓存策略以及其他行为。
- **透明性**: 对客户端来说,代理应该是透明的,客户端只需像访问普通服务器一样访问代理即可。

View File

@ -0,0 +1,46 @@
# 系统架构与设计模式
`cache-proxy` 的系统设计遵循了几个关键的模式和原则,以实现其高性能和高可用性的目标。
## 核心架构
系统可以分为三个主要部分:
1. **HTTP 服务器层**: 负责接收客户端请求,并使用中间件进行日志记录、错误恢复等通用处理。
2. **缓存处理层**: 检查请求的文件是否存在于本地缓存中,并根据缓存策略决定是直接提供缓存文件还是向上游请求。
3. **上游选择与下载层**: 这是系统的核心,负责并发地从多个上游服务器获取数据,并管理下载过程。
## 关键设计模式
### 1. 竞争式请求 (Racing Requests)
这是实现“选择最快”功能的核心模式。
- `fastesUpstream` 函数为每个上游服务器创建一个 goroutine。
- 所有 goroutine 并发地向上游服务器发送请求。
- 使用 `sync.Once` 来确保只有一个 goroutine 能够“胜出”并成为最终的数据源。
- 一旦有 goroutine 胜出,它会调用 `context.CancelFunc` 来通知所有其他 goroutine 停止工作,从而避免不必要的资源消耗。
### 2. 生产者-消费者模式 (Producer-Consumer)
在文件下载过程中,使用了生产者-消费者模式。
- **生产者**: `tryUpstream` 函数中的 goroutine 负责从上游服务器读取数据块chunk并将其放入一个 `chan Chunk` 中。
- **消费者**: `streamOnline` 函数中的代码从 `chan Chunk` 中读取数据,并执行两个操作:
1. 将数据写入 `bytes.Buffer`,供后续的请求者使用。
2. 将数据写入本地临时文件,用于持久化缓存。
### 3. 并发访问控制 (Mutex for Concurrent Access)
为了处理多个客户端同时请求同一个文件的情况,系统使用了 `sync.Mutex` 和一个 `map[string]*StreamObject`
- 当第一个请求到达时,它会获得一个锁,并创建一个 `StreamObject` 来代表这个正在进行的下载任务。
- 后续对同一文件的请求会发现 `StreamObject` 已存在,它们不会再次向上游发起请求,而是会等待并从这个共享的 `StreamObject` 中读取数据。
- 下载完成后,`StreamObject` 会从 map 中移除。
### 4. 中间件 (Middleware)
项目使用了 Go 的标准 `http.Handler` 接口和中间件模式来构建请求处理链。
- `pkgs/middleware` 目录中定义了可重用的中间件,如 `httplog``recover`
- 这种模式使得在不修改核心业务逻辑的情况下,可以轻松地添加或删除日志、认证、错误处理等功能。

View File

@ -0,0 +1,26 @@
# 技术栈与开发环境
## 核心技术
- **语言**: Go (Golang)
- **主要依赖**:
- `gopkg.in/yaml.v3`: 用于解析 `config.yaml` 配置文件。
- `log/slog`: Go 1.21+ 内置的结构化日志库。
- `net/http`: 用于构建 HTTP 服务器和客户端。
- `github.com/getsentry/sentry-go`: (可选)用于错误追踪。
## 开发与构建
- **依赖管理**: 使用 Go Modules (`go.mod`, `go.sum`) 进行依赖管理。
- **配置文件**: 项目的行为由 `config.yaml` 文件驱动。
- **容器化**:
- `Dockerfile`: 用于构建项目的 Docker 镜像。
- `compose.yaml`: 用于在开发环境中启动服务。
- `compose.release.yaml`: 用于在生产环境中部署服务。
## 关键技术点
- **并发模型**: 大量使用 goroutine 和 channel 来实现高并发。特别是 `fastesUpstream` 函数中的并发请求模式。
- **流式处理**: 通过 `io.Reader``io.Writer` 接口,数据以流的形式被处理,从未完全加载到内存中,这对于处理大文件至关重要。
- **错误处理**: 在 Go 的标准错误处理之上,项目在某些关键路径(如 `fastesUpstream`)中使用了 `context` 包来处理超时和取消操作。
- **结构化日志**: 使用 `slog` 库,可以输出结构化的、易于机器解析的日志,方便调试和监控。

View File

@ -1,4 +1,4 @@
package handlerlog
package handlerctx
import (
"context"

View File

@ -6,7 +6,7 @@ import (
"net/http"
"time"
"git.jeffthecoder.xyz/guochao/cache-proxy/pkgs/middleware/handlerlog"
"git.jeffthecoder.xyz/guochao/cache-proxy/pkgs/middleware/handlerctx"
)
type ctxKey int
@ -43,7 +43,7 @@ func Log(config Config) func(http.Handler) http.Handler {
"path", r.URL.Path,
}
if pattern, handlerFound := handlerlog.Pattern(r); handlerFound {
if pattern, handlerFound := handlerctx.Pattern(r); handlerFound {
args = append(args, "handler", pattern)
}

129
server.go
View File

@ -4,9 +4,9 @@ import (
"bytes"
"context"
"errors"
"fmt"
"io"
"log/slog"
"net"
"net/http"
"os"
"path/filepath"
@ -25,12 +25,17 @@ const (
var zeroTime time.Time
var preclosedChan = make(chan struct{})
func init() {
close(preclosedChan)
}
var (
httpClient = http.Client{
// check allowed redirect
CheckRedirect: func(req *http.Request, via []*http.Request) error {
lastRequest := via[len(via)-1]
if allowedRedirect, ok := lastRequest.Context().Value(reqCtxAllowedRedirect).(string); ok {
if allowedRedirect, ok := req.Context().Value(reqCtxAllowedRedirect).(string); ok {
if matched, err := regexp.MatchString(allowedRedirect, req.URL.String()); err != nil {
return err
} else if !matched {
@ -106,7 +111,7 @@ type Chunk struct {
}
func (server *Server) serveFile(w http.ResponseWriter, r *http.Request, path string) {
if location := r.Header.Get(server.Storage.Accel.EnableByHeader); server.Storage.Accel.EnableByHeader != "" && location != "" {
if location := r.Header.Get(server.Storage.Local.Accel.EnableByHeader); server.Storage.Local.Accel.EnableByHeader != "" && location != "" {
relPath, err := filepath.Rel(server.Storage.Local.Path, path)
if err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
@ -114,7 +119,7 @@ func (server *Server) serveFile(w http.ResponseWriter, r *http.Request, path str
}
accelPath := filepath.Join(location, relPath)
for _, headerKey := range server.Storage.Accel.ResponseWithHeaders {
for _, headerKey := range server.Storage.Local.Accel.RespondWithHeaders {
w.Header().Set(headerKey, accelPath)
}
@ -147,7 +152,7 @@ func (server *Server) HandleRequestWithCache(w http.ResponseWriter, r *http.Requ
} else if localStatus != localNotExists {
if localStatus == localExistsButNeedHead {
if ranged {
server.streamOnline(nil, r, mtime, fullpath)
<-server.streamOnline(nil, r, mtime, fullpath)
server.serveFile(w, r, fullpath)
} else {
server.streamOnline(w, r, mtime, fullpath)
@ -157,7 +162,7 @@ func (server *Server) HandleRequestWithCache(w http.ResponseWriter, r *http.Requ
}
} else {
if ranged {
server.streamOnline(nil, r, mtime, fullpath)
<-server.streamOnline(nil, r, mtime, fullpath)
server.serveFile(w, r, fullpath)
} else {
server.streamOnline(w, r, mtime, fullpath)
@ -209,7 +214,7 @@ func (server *Server) checkLocal(w http.ResponseWriter, _ *http.Request, key str
return localNotExists, zeroTime, nil
}
func (server *Server) streamOnline(w http.ResponseWriter, r *http.Request, mtime time.Time, key string) {
func (server *Server) streamOnline(w http.ResponseWriter, r *http.Request, mtime time.Time, key string) <-chan struct{} {
memoryObject, exists := server.o[r.URL.Path]
locked := false
defer func() {
@ -241,32 +246,39 @@ func (server *Server) streamOnline(w http.ResponseWriter, r *http.Request, mtime
slog.With("error", err).Warn("failed to stream response with existing memory object")
}
}
return preclosedChan
} else {
slog.With("mtime", mtime).Debug("checking fastest upstream")
selectedIdx, response, chunks, err := server.fastesUpstream(r, mtime)
selectedIdx, response, chunks, err := server.fastestUpstream(r, mtime)
if chunks == nil && mtime != zeroTime {
slog.With("upstreamIdx", selectedIdx, "key", key).Debug("not modified. using local version")
if w != nil {
server.serveFile(w, r, key)
}
return
return preclosedChan
}
if err != nil {
slog.With("error", err).Warn("failed to select fastest upstream")
http.Error(w, err.Error(), http.StatusInternalServerError)
return
if w != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
}
return preclosedChan
}
if selectedIdx == -1 || response == nil || chunks == nil {
slog.Debug("no upstream is selected")
http.NotFound(w, r)
return
if w != nil {
http.NotFound(w, r)
}
return preclosedChan
}
if response.StatusCode == http.StatusNotModified {
slog.With("upstreamIdx", selectedIdx).Debug("not modified. using local version")
os.Chtimes(key, zeroTime, time.Now())
server.serveFile(w, r, key)
return
if w != nil {
server.serveFile(w, r, key)
}
return preclosedChan
}
slog.With(
@ -332,8 +344,33 @@ func (server *Server) streamOnline(w http.ResponseWriter, r *http.Request, mtime
}
if err != nil {
slog.With("error", err, "upstreamIdx", selectedIdx).Error("something happened during download. will not cache this response")
logger := slog.With("upstreamIdx", selectedIdx)
logger.Error("something happened during download. will not cache this response. setting lingering to reset the connection.")
hijacker, ok := w.(http.Hijacker)
if !ok {
logger.Warn("response writer is not a hijacker. failed to set lingering")
return preclosedChan
}
conn, _, err := hijacker.Hijack()
if err != nil {
logger.With("error", err).Warn("hijack failed. failed to set lingering")
return preclosedChan
}
defer conn.Close()
tcpConn, ok := conn.(*net.TCPConn)
if !ok {
logger.With("error", err).Warn("connection is not a *net.TCPConn. failed to set lingering")
return preclosedChan
}
if err := tcpConn.SetLinger(0); err != nil {
logger.With("error", err).Warn("failed to set lingering")
return preclosedChan
}
logger.Debug("connection set to linger. it will be reset once the conn.Close is called")
}
fileWrittenCh := make(chan struct{})
go func() {
defer func() {
server.lu.Lock()
@ -341,6 +378,7 @@ func (server *Server) streamOnline(w http.ResponseWriter, r *http.Request, mtime
delete(server.o, r.URL.Path)
slog.Debug("memory object released")
close(fileWrittenCh)
}()
if err == nil {
@ -405,10 +443,11 @@ func (server *Server) streamOnline(w http.ResponseWriter, r *http.Request, mtime
}
}()
return fileWrittenCh
}
}
func (server *Server) fastesUpstream(r *http.Request, lastModified time.Time) (resultIdx int, resultResponse *http.Response, resultCh chan Chunk, resultErr error) {
func (server *Server) fastestUpstream(r *http.Request, lastModified time.Time) (resultIdx int, resultResponse *http.Response, resultCh chan Chunk, resultErr error) {
returnLock := &sync.Mutex{}
upstreams := len(server.Upstreams)
cancelFuncs := make([]func(), upstreams)
@ -594,18 +633,54 @@ func (server *Server) tryUpstream(ctx context.Context, upstreamIdx, priority int
if err != nil {
return nil, nil, err
}
streaming := false
defer func() {
if !streaming && response != nil {
response.Body.Close()
}
}()
if response.StatusCode == http.StatusNotModified {
return response, nil, nil
}
if response.StatusCode >= 400 && response.StatusCode < 500 {
return nil, nil, nil
responseCheckers := upstream.Checkers
if len(responseCheckers) == 0 {
responseCheckers = append(responseCheckers, Checker{})
}
if response.StatusCode < 200 || response.StatusCode >= 500 {
logger.With(
"url", newurl,
"status", response.StatusCode,
).Warn("unexpected status")
return response, nil, fmt.Errorf("unexpected status(url=%v): %v: %v", newurl, response.StatusCode, response)
for _, checker := range responseCheckers {
if len(checker.StatusCodes) == 0 {
checker.StatusCodes = append(checker.StatusCodes, http.StatusOK)
}
if !slices.Contains(checker.StatusCodes, response.StatusCode) {
return nil, nil, err
}
for _, headerChecker := range checker.Headers {
if headerChecker.Match == nil {
// check header exists
if _, ok := response.Header[headerChecker.Name]; !ok {
logger.Debug("missing header", "header", headerChecker.Name)
return nil, nil, nil
}
} else {
// check header match
value := response.Header.Get(headerChecker.Name)
if matched, err := regexp.MatchString(*headerChecker.Match, value); err != nil {
return nil, nil, err
} else if !matched {
logger.Debug("invalid header value",
"header", headerChecker.Name,
"value", value,
"matcher", *headerChecker.Match,
)
return nil, nil, nil
}
}
}
}
var currentOffset int64
@ -622,8 +697,10 @@ func (server *Server) tryUpstream(ctx context.Context, upstreamIdx, priority int
}
ch <- Chunk{buffer: buffer[:n]}
streaming = true
go func() {
defer close(ch)
defer response.Body.Close()
for {
buffer := make([]byte, server.Misc.ChunkBytes)

1296
server_test.go Normal file

File diff suppressed because it is too large Load Diff