feat(proxy): fail closed with namespace-isolated reasoning cache and safer defaults (#9)
parent
87bef7b660
commit
bf4ba45207
31
README.md
31
README.md
|
|
@ -4,7 +4,7 @@ Compatibility proxy connecting Cursor to DeepSeek thinking models (`deepseek-v4-
|
|||
|
||||
## What It Does
|
||||
|
||||
- ✅ Caches DeepSeek `reasoning_content` from regular and streamed responses, then restores it on later tool-call turns when Cursor omits it. See [DeepSeek docs](https://api-docs.deepseek.com/guides/thinking_mode#tool-calls) for more details.
|
||||
- ✅ Caches DeepSeek `reasoning_content` from regular and streamed responses, then restores it on later tool-call turns when Cursor omits it. If the exact original reasoning is unavailable, the proxy fails closed instead of sending a fake placeholder. See [DeepSeek docs](https://api-docs.deepseek.com/guides/thinking_mode#tool-calls) for more details.
|
||||
- ✅ Mirrors streamed `reasoning_content` into Cursor-visible `<think>...</think>` text so that thinking tokens are shown in Cursor's UI. For BYOK/proxy mode, Cursor renders this as normal text, not as a native collapsible thinking block.
|
||||
- ✅ Starts an ngrok tunnel so Cursor can reach the local proxy through a public HTTPS URL.
|
||||
- ✅ Provides other compatibility fixes to make DeepSeek models run well in Cursor.
|
||||
|
|
@ -53,6 +53,8 @@ In Cursor, add the DeepSeek custom model and point it at this proxy:
|
|||
- API Key: your DeepSeek API key
|
||||
- Base URL: your ngrok HTTPS URL with the `/v1` API version path
|
||||
|
||||
The proxy respects the DeepSeek model name Cursor sends, such as `deepseek-v4-pro` or `deepseek-v4-flash`. The `model` field in `config.yaml` is only the fallback used when a request does not include a model.
|
||||
|
||||
For example, if ngrok dashboard shows `https://example.ngrok-free.app`, use:
|
||||
|
||||
```text
|
||||
|
|
@ -94,8 +96,19 @@ Select `deepseek-v4-pro` in Cursor and use chat or agent mode as usual.
|
|||
|
||||

|
||||
|
||||
## How It Works
|
||||
|
||||
DeepSeek's [thinking mode](https://api-docs.deepseek.com/guides/thinking_mode#tool-calls) requires `reasoning_content` from assistant messages in tool-call sequences to be passed back in later requests. Cursor may omit this field, causing DeepSeek to return a 400 error. This proxy sits between Cursor and DeepSeek (`Cursor → ngrok → proxy → DeepSeek API`) and repairs requests when it has the exact original reasoning cached.
|
||||
|
||||
- Core fix: every DeepSeek response, streaming or non-streaming, has its `reasoning_content` stored in a local SQLite cache keyed by message signature, tool-call ID, and tool-call function signature. On outgoing thinking-mode requests, the proxy restores missing `reasoning_content` for tool-call-related assistant messages and sends the complete history to DeepSeek. If the cache is cold, such as after a proxy restart, it returns a local error instead of fabricating reasoning.
|
||||
- Multi-conversation isolation: cache keys are scoped by a SHA-256 hash of the canonical conversation prefix (roles, content, tool calls, excluding `reasoning_content`) plus the upstream model/configuration and an API-key hash. Concurrent or interleaved threads with different histories get different scopes, so reused tool-call IDs do not collide. Byte-identical cloned histories are indistinguishable unless Cursor sends a differentiating history.
|
||||
- DeepSeek [prefix caching](https://api-docs.deepseek.com/guides/kv_cache) compatibility: the proxy does not inject synthetic thread IDs, timestamps, or cache-control messages into the prompt. When it restores cached reasoning, it restores the exact original string, preserving repeated prefixes for DeepSeek's automatic best-effort context cache.
|
||||
- Additional compatibility fixes: the proxy converts legacy `functions`/`function_call` fields to `tools`/`tool_choice`, preserves required and named tool-choice semantics, normalizes `reasoning_effort` aliases per DeepSeek docs, strips mirrored `<think>` blocks from assistant content, converts multi-part content arrays to plain text, logs DeepSeek prompt-cache usage when available, and mirrors `reasoning_content` into Cursor-visible `<think>...</think>` blocks for thinking display.
|
||||
|
||||
## Debugging
|
||||
|
||||
Normal logs avoid request/response bodies but still print compact request and usage statistics. `rounds` is the number of user turns in the forwarded history, `reasoning` is the number and character size of `reasoning_content` fields sent to DeepSeek, and `cache=hit/miss` comes from DeepSeek's `usage.prompt_cache_hit_tokens` / `prompt_cache_miss_tokens`.
|
||||
|
||||
Run with verbose output:
|
||||
|
||||
```bash
|
||||
|
|
@ -108,12 +121,28 @@ Run without ngrok for local curl testing:
|
|||
PROXY_NGROK=false deepseek-cursor-proxy --port 9000 --verbose
|
||||
```
|
||||
|
||||
If Cursor shows `missing_reasoning_content`, the current chat contains thinking-mode tool-call history whose original DeepSeek `reasoning_content` is not in the local cache. This commonly happens when continuing an older chat after a proxy restart, cache clear, or cache format/config change. The local 409 response includes a diagnostic placeholder so the cause is visible, but that placeholder is not forwarded to DeepSeek in the default safe mode. Start a new chat, or retry from the original tool-call turn while the proxy is running so it can capture the reasoning.
|
||||
|
||||
For debugging an old Cursor history, you can opt into a non-compliant compatibility fallback:
|
||||
|
||||
```bash
|
||||
deepseek-cursor-proxy --verbose --missing-reasoning-strategy placeholder
|
||||
```
|
||||
|
||||
This inserts a loud placeholder into missing `reasoning_content` fields and forwards the request. It may still be rejected by DeepSeek and should not be used for normal work.
|
||||
|
||||
Use another config file:
|
||||
|
||||
```bash
|
||||
deepseek-cursor-proxy --config ./dev.config.yaml
|
||||
```
|
||||
|
||||
Clear the local reasoning cache:
|
||||
|
||||
```bash
|
||||
deepseek-cursor-proxy --clear-reasoning-cache
|
||||
```
|
||||
|
||||
Run tests:
|
||||
|
||||
```bash
|
||||
|
|
|
|||
|
|
@ -1,6 +1,8 @@
|
|||
# This file was created automatically at ~/.deepseek-cursor-proxy/config.yaml.
|
||||
# API keys are read from Cursor's Authorization header and forwarded upstream.
|
||||
|
||||
# `model` is the fallback when a request has no model; Cursor's requested
|
||||
# DeepSeek model name is otherwise respected.
|
||||
base_url: https://api.deepseek.com
|
||||
model: deepseek-v4-pro
|
||||
thinking: enabled
|
||||
|
|
@ -12,5 +14,10 @@ port: 9000
|
|||
ngrok: true
|
||||
verbose: false
|
||||
request_timeout: 300
|
||||
max_request_body_bytes: 20971520
|
||||
cors: false
|
||||
|
||||
reasoning_content_path: reasoning_content.sqlite3
|
||||
missing_reasoning_strategy: reject
|
||||
reasoning_cache_max_age_seconds: 604800
|
||||
reasoning_cache_max_rows: 10000
|
||||
|
|
|
|||
|
|
@ -18,6 +18,8 @@ MISSING = object()
|
|||
DEFAULT_CONFIG_TEXT = """# This file was created automatically at ~/.deepseek-cursor-proxy/config.yaml.
|
||||
# API keys are read from Cursor's Authorization header and forwarded upstream.
|
||||
|
||||
# `model` is the fallback when a request has no model; Cursor's requested
|
||||
# DeepSeek model name is otherwise respected.
|
||||
base_url: https://api.deepseek.com
|
||||
model: deepseek-v4-pro
|
||||
thinking: enabled
|
||||
|
|
@ -29,8 +31,13 @@ port: 9000
|
|||
ngrok: true
|
||||
verbose: false
|
||||
request_timeout: 300
|
||||
max_request_body_bytes: 20971520
|
||||
cors: false
|
||||
|
||||
reasoning_content_path: reasoning_content.sqlite3
|
||||
missing_reasoning_strategy: reject
|
||||
reasoning_cache_max_age_seconds: 604800
|
||||
reasoning_cache_max_rows: 10000
|
||||
"""
|
||||
|
||||
|
||||
|
|
@ -163,8 +170,13 @@ class ProxyConfig:
|
|||
thinking: str = "enabled"
|
||||
reasoning_effort: str = "high"
|
||||
request_timeout: float = 300.0
|
||||
max_request_body_bytes: int = 20 * 1024 * 1024
|
||||
reasoning_content_path: Path = field(default_factory=default_reasoning_content_path)
|
||||
missing_reasoning_strategy: str = "reject"
|
||||
reasoning_cache_max_age_seconds: int = 7 * 24 * 60 * 60
|
||||
reasoning_cache_max_rows: int = 10000
|
||||
cursor_display_reasoning: bool = True
|
||||
cors: bool = False
|
||||
verbose: bool = False
|
||||
ngrok: bool = False
|
||||
|
||||
|
|
@ -195,6 +207,22 @@ class ProxyConfig:
|
|||
if thinking not in {"enabled", "disabled", "pass-through"}:
|
||||
thinking = "enabled"
|
||||
|
||||
missing_reasoning_strategy = (
|
||||
as_str(
|
||||
setting_value(
|
||||
settings,
|
||||
live_env,
|
||||
"missing_reasoning_strategy",
|
||||
"MISSING_REASONING_STRATEGY",
|
||||
),
|
||||
"reject",
|
||||
)
|
||||
.strip()
|
||||
.lower()
|
||||
)
|
||||
if missing_reasoning_strategy not in {"reject", "placeholder"}:
|
||||
missing_reasoning_strategy = "reject"
|
||||
|
||||
return cls(
|
||||
host=as_str(
|
||||
setting_value(
|
||||
|
|
@ -260,6 +288,15 @@ class ProxyConfig:
|
|||
),
|
||||
300.0,
|
||||
),
|
||||
max_request_body_bytes=as_int(
|
||||
setting_value(
|
||||
settings,
|
||||
live_env,
|
||||
"max_request_body_bytes",
|
||||
"PROXY_MAX_REQUEST_BODY_BYTES",
|
||||
),
|
||||
20 * 1024 * 1024,
|
||||
),
|
||||
reasoning_content_path=as_path(
|
||||
setting_value(
|
||||
settings,
|
||||
|
|
@ -270,6 +307,25 @@ class ProxyConfig:
|
|||
default_reasoning_content_path(),
|
||||
config_dir,
|
||||
),
|
||||
missing_reasoning_strategy=missing_reasoning_strategy,
|
||||
reasoning_cache_max_age_seconds=as_int(
|
||||
setting_value(
|
||||
settings,
|
||||
live_env,
|
||||
"reasoning_cache_max_age_seconds",
|
||||
"REASONING_CACHE_MAX_AGE_SECONDS",
|
||||
),
|
||||
7 * 24 * 60 * 60,
|
||||
),
|
||||
reasoning_cache_max_rows=as_int(
|
||||
setting_value(
|
||||
settings,
|
||||
live_env,
|
||||
"reasoning_cache_max_rows",
|
||||
"REASONING_CACHE_MAX_ROWS",
|
||||
),
|
||||
10000,
|
||||
),
|
||||
cursor_display_reasoning=as_bool(
|
||||
setting_value(
|
||||
settings,
|
||||
|
|
@ -279,6 +335,15 @@ class ProxyConfig:
|
|||
),
|
||||
True,
|
||||
),
|
||||
cors=as_bool(
|
||||
setting_value(
|
||||
settings,
|
||||
live_env,
|
||||
"cors",
|
||||
"PROXY_CORS",
|
||||
),
|
||||
False,
|
||||
),
|
||||
verbose=as_bool(
|
||||
setting_value(
|
||||
settings,
|
||||
|
|
|
|||
|
|
@ -76,8 +76,11 @@ def canonical_scope_message(message: dict[str, Any]) -> dict[str, Any]:
|
|||
return canonical
|
||||
|
||||
|
||||
def conversation_scope(messages: list[dict[str, Any]]) -> str:
|
||||
payload = [canonical_scope_message(message) for message in messages]
|
||||
def conversation_scope(messages: list[dict[str, Any]], namespace: str = "") -> str:
|
||||
scope_messages = [canonical_scope_message(message) for message in messages]
|
||||
payload: Any = scope_messages
|
||||
if namespace:
|
||||
payload = {"namespace": namespace, "messages": scope_messages}
|
||||
canonical = json.dumps(
|
||||
payload, ensure_ascii=False, sort_keys=True, separators=(",", ":")
|
||||
)
|
||||
|
|
@ -85,7 +88,14 @@ def conversation_scope(messages: list[dict[str, Any]]) -> str:
|
|||
|
||||
|
||||
class ReasoningStore:
|
||||
def __init__(self, reasoning_content_path: str | Path) -> None:
|
||||
def __init__(
|
||||
self,
|
||||
reasoning_content_path: str | Path,
|
||||
max_age_seconds: int | None = None,
|
||||
max_rows: int | None = None,
|
||||
) -> None:
|
||||
self.max_age_seconds = max_age_seconds
|
||||
self.max_rows = max_rows
|
||||
if str(reasoning_content_path) == ":memory:":
|
||||
self.reasoning_content_path: str | Path = ":memory:"
|
||||
else:
|
||||
|
|
@ -110,13 +120,14 @@ class ReasoningStore:
|
|||
"""
|
||||
)
|
||||
self._conn.commit()
|
||||
self.prune()
|
||||
|
||||
def close(self) -> None:
|
||||
with self._lock:
|
||||
self._conn.close()
|
||||
|
||||
def put(self, key: str, reasoning: str, message: dict[str, Any]) -> None:
|
||||
if not reasoning:
|
||||
if not isinstance(reasoning, str):
|
||||
return
|
||||
message_json = json.dumps(message, ensure_ascii=False, sort_keys=True)
|
||||
with self._lock:
|
||||
|
|
@ -131,6 +142,7 @@ class ReasoningStore:
|
|||
""",
|
||||
(key, reasoning, message_json, time.time()),
|
||||
)
|
||||
self._prune_locked()
|
||||
self._conn.commit()
|
||||
|
||||
def get(self, key: str) -> str | None:
|
||||
|
|
@ -147,7 +159,7 @@ class ReasoningStore:
|
|||
if message.get("role") != "assistant":
|
||||
return 0
|
||||
reasoning = message.get("reasoning_content")
|
||||
if not isinstance(reasoning, str) or not reasoning:
|
||||
if not isinstance(reasoning, str):
|
||||
return 0
|
||||
|
||||
keys = [f"scope:{scope}:signature:{message_signature(message)}"]
|
||||
|
|
@ -166,11 +178,11 @@ class ReasoningStore:
|
|||
|
||||
def lookup_for_message(self, message: dict[str, Any], scope: str) -> str | None:
|
||||
reasoning = self.get(f"scope:{scope}:signature:{message_signature(message)}")
|
||||
if reasoning:
|
||||
if reasoning is not None:
|
||||
return reasoning
|
||||
for tool_call_id in tool_call_ids(message):
|
||||
reasoning = self.get(f"scope:{scope}:tool_call:{tool_call_id}")
|
||||
if reasoning:
|
||||
if reasoning is not None:
|
||||
return reasoning
|
||||
for tool_call in message.get("tool_calls") or []:
|
||||
if not isinstance(tool_call, dict):
|
||||
|
|
@ -178,6 +190,46 @@ class ReasoningStore:
|
|||
reasoning = self.get(
|
||||
f"scope:{scope}:tool_call_signature:{tool_call_signature(tool_call)}"
|
||||
)
|
||||
if reasoning:
|
||||
if reasoning is not None:
|
||||
return reasoning
|
||||
return None
|
||||
|
||||
def clear(self) -> int:
|
||||
with self._lock:
|
||||
row = self._conn.execute("SELECT COUNT(*) FROM reasoning_cache").fetchone()
|
||||
count = int(row[0] if row else 0)
|
||||
self._conn.execute("DELETE FROM reasoning_cache")
|
||||
self._conn.commit()
|
||||
return count
|
||||
|
||||
def prune(self) -> int:
|
||||
with self._lock:
|
||||
deleted = self._prune_locked()
|
||||
self._conn.commit()
|
||||
return deleted
|
||||
|
||||
def _prune_locked(self) -> int:
|
||||
deleted = 0
|
||||
if self.max_age_seconds is not None and self.max_age_seconds > 0:
|
||||
cutoff = time.time() - self.max_age_seconds
|
||||
cursor = self._conn.execute(
|
||||
"DELETE FROM reasoning_cache WHERE created_at < ?",
|
||||
(cutoff,),
|
||||
)
|
||||
deleted += cursor.rowcount if cursor.rowcount != -1 else 0
|
||||
|
||||
if self.max_rows is not None and self.max_rows > 0:
|
||||
cursor = self._conn.execute(
|
||||
"""
|
||||
DELETE FROM reasoning_cache
|
||||
WHERE key NOT IN (
|
||||
SELECT key
|
||||
FROM reasoning_cache
|
||||
ORDER BY created_at DESC
|
||||
LIMIT ?
|
||||
)
|
||||
""",
|
||||
(self.max_rows,),
|
||||
)
|
||||
deleted += cursor.rowcount if cursor.rowcount != -1 else 0
|
||||
return deleted
|
||||
|
|
|
|||
|
|
@ -23,12 +23,20 @@ from .config import (
|
|||
from .reasoning_store import ReasoningStore, conversation_scope
|
||||
from .streaming import CursorReasoningDisplayAdapter, StreamAccumulator
|
||||
from .tunnel import NgrokTunnel, local_tunnel_target
|
||||
from .transform import prepare_upstream_request, rewrite_response_body
|
||||
from .transform import (
|
||||
PLACEHOLDER_REASONING_CONTENT,
|
||||
prepare_upstream_request,
|
||||
rewrite_response_body,
|
||||
)
|
||||
|
||||
|
||||
LOG = logging.getLogger("deepseek_cursor_proxy")
|
||||
|
||||
|
||||
class RequestBodyTooLarge(ValueError):
|
||||
pass
|
||||
|
||||
|
||||
class DeepSeekProxyServer(ThreadingHTTPServer):
|
||||
config: ProxyConfig
|
||||
reasoning_store: ReasoningStore
|
||||
|
|
@ -102,6 +110,12 @@ class DeepSeekProxyHandler(BaseHTTPRequestHandler):
|
|||
|
||||
try:
|
||||
payload = self._read_json_body()
|
||||
except RequestBodyTooLarge as exc:
|
||||
LOG.warning(
|
||||
"rejected request path=%s status=413 reason=%s", request_path, exc
|
||||
)
|
||||
self._send_json(413, {"error": {"message": str(exc)}})
|
||||
return
|
||||
except ValueError as exc:
|
||||
LOG.warning(
|
||||
"rejected request path=%s status=400 reason=%s", request_path, exc
|
||||
|
|
@ -114,28 +128,73 @@ class DeepSeekProxyHandler(BaseHTTPRequestHandler):
|
|||
|
||||
LOG.info("cursor request: %s", summarize_chat_payload(payload))
|
||||
|
||||
prepared = prepare_upstream_request(payload, self.config, self.reasoning_store)
|
||||
prepared = prepare_upstream_request(
|
||||
payload,
|
||||
self.config,
|
||||
self.reasoning_store,
|
||||
authorization=cursor_authorization,
|
||||
)
|
||||
if prepared.patched_reasoning_messages:
|
||||
LOG.info(
|
||||
"restored reasoning_content on %s assistant message(s)",
|
||||
prepared.patched_reasoning_messages,
|
||||
)
|
||||
if prepared.fallback_reasoning_messages:
|
||||
if prepared.placeholder_reasoning_messages:
|
||||
LOG.warning(
|
||||
"added compatibility reasoning_content placeholder on %s uncached assistant message(s)",
|
||||
prepared.fallback_reasoning_messages,
|
||||
(
|
||||
"inserted placeholder reasoning_content on %s assistant "
|
||||
"message(s); this is compatibility mode and may still be "
|
||||
"rejected by DeepSeek"
|
||||
),
|
||||
prepared.placeholder_reasoning_messages,
|
||||
)
|
||||
if prepared.missing_reasoning_messages:
|
||||
diagnostic_placeholder = (
|
||||
f"{PLACEHOLDER_REASONING_CONTENT} "
|
||||
"[not sent upstream because missing_reasoning_strategy=reject]"
|
||||
)
|
||||
LOG.warning(
|
||||
"rejected request path=%s status=409 reason=missing_reasoning_content count=%s",
|
||||
request_path,
|
||||
prepared.missing_reasoning_messages,
|
||||
)
|
||||
self._send_json(
|
||||
409,
|
||||
{
|
||||
"error": {
|
||||
"message": (
|
||||
"Missing cached DeepSeek reasoning_content for a "
|
||||
f"thinking-mode tool-call history on "
|
||||
f"{prepared.missing_reasoning_messages} assistant "
|
||||
"message(s). This usually means the chat has tool-call "
|
||||
"turns that were not captured by this proxy/cache. Start "
|
||||
"a new chat or retry from the original tool-call turn."
|
||||
),
|
||||
"type": "missing_reasoning_content",
|
||||
"code": "missing_reasoning_content",
|
||||
"missing_reasoning_messages": prepared.missing_reasoning_messages,
|
||||
"diagnostic_placeholder": diagnostic_placeholder,
|
||||
}
|
||||
},
|
||||
)
|
||||
return
|
||||
LOG.info(
|
||||
"deepseek send: %s patched=%s placeholder=%s",
|
||||
compact_request_stats(prepared.payload),
|
||||
prepared.patched_reasoning_messages,
|
||||
prepared.placeholder_reasoning_messages,
|
||||
)
|
||||
|
||||
if self.config.verbose:
|
||||
LOG.info(
|
||||
(
|
||||
"upstream request metadata: original_model=%s upstream_model=%s "
|
||||
"patched_reasoning=%s fallback_reasoning=%s %s"
|
||||
"patched_reasoning=%s missing_reasoning=%s %s"
|
||||
),
|
||||
prepared.original_model,
|
||||
prepared.upstream_model,
|
||||
prepared.patched_reasoning_messages,
|
||||
prepared.fallback_reasoning_messages,
|
||||
prepared.missing_reasoning_messages,
|
||||
summarize_chat_payload(prepared.payload),
|
||||
)
|
||||
|
||||
|
|
@ -191,22 +250,28 @@ class DeepSeekProxyHandler(BaseHTTPRequestHandler):
|
|||
)
|
||||
if prepared.payload.get("stream"):
|
||||
self._proxy_streaming_response(
|
||||
response, prepared.original_model, prepared.payload["messages"]
|
||||
response,
|
||||
prepared.original_model,
|
||||
prepared.payload["messages"],
|
||||
prepared.cache_namespace,
|
||||
)
|
||||
else:
|
||||
self._proxy_regular_response(
|
||||
response, prepared.original_model, prepared.payload["messages"]
|
||||
response,
|
||||
prepared.original_model,
|
||||
prepared.payload["messages"],
|
||||
prepared.cache_namespace,
|
||||
)
|
||||
LOG.info(
|
||||
(
|
||||
"request complete status=%s stream=%s elapsed_ms=%s "
|
||||
"patched_reasoning=%s fallback_reasoning=%s"
|
||||
"patched_reasoning=%s missing_reasoning=%s"
|
||||
),
|
||||
upstream_status,
|
||||
bool(prepared.payload.get("stream")),
|
||||
elapsed_ms(started),
|
||||
prepared.patched_reasoning_messages,
|
||||
prepared.fallback_reasoning_messages,
|
||||
prepared.missing_reasoning_messages,
|
||||
)
|
||||
|
||||
def _cursor_authorization(self) -> str | None:
|
||||
|
|
@ -217,6 +282,8 @@ class DeepSeekProxyHandler(BaseHTTPRequestHandler):
|
|||
return f"Bearer {token.strip()}"
|
||||
|
||||
def _send_cors_headers(self) -> None:
|
||||
if not self.config.cors:
|
||||
return
|
||||
self.send_header("Access-Control-Allow-Origin", "*")
|
||||
self.send_header("Access-Control-Allow-Methods", "POST, GET, OPTIONS")
|
||||
self.send_header(
|
||||
|
|
@ -239,18 +306,37 @@ class DeepSeekProxyHandler(BaseHTTPRequestHandler):
|
|||
|
||||
def _send_models(self) -> None:
|
||||
created = int(time.time())
|
||||
model_ids = list(
|
||||
dict.fromkeys(
|
||||
[
|
||||
self.config.upstream_model,
|
||||
"deepseek-v4-pro",
|
||||
"deepseek-v4-flash",
|
||||
]
|
||||
)
|
||||
)
|
||||
models = [
|
||||
{
|
||||
"id": self.config.upstream_model,
|
||||
"id": model_id,
|
||||
"object": "model",
|
||||
"created": created,
|
||||
"owned_by": "deepseek",
|
||||
}
|
||||
for model_id in model_ids
|
||||
]
|
||||
self._send_json(200, {"object": "list", "data": models})
|
||||
|
||||
def _read_json_body(self) -> dict[str, Any]:
|
||||
length = int(self.headers.get("Content-Length") or 0)
|
||||
try:
|
||||
length = int(self.headers.get("Content-Length") or 0)
|
||||
except ValueError as exc:
|
||||
raise ValueError("Invalid Content-Length") from exc
|
||||
if length < 0:
|
||||
raise ValueError("Invalid Content-Length")
|
||||
if length > self.config.max_request_body_bytes:
|
||||
raise RequestBodyTooLarge(
|
||||
f"Request body is too large; limit is {self.config.max_request_body_bytes} bytes"
|
||||
)
|
||||
raw_body = self.rfile.read(length)
|
||||
if not raw_body:
|
||||
raise ValueError("Request body is empty")
|
||||
|
|
@ -293,14 +379,20 @@ class DeepSeekProxyHandler(BaseHTTPRequestHandler):
|
|||
response: Any,
|
||||
original_model: str,
|
||||
request_messages: list[dict[str, Any]],
|
||||
cache_namespace: str,
|
||||
) -> None:
|
||||
body = read_response_body(response)
|
||||
try:
|
||||
body = rewrite_response_body(
|
||||
body, original_model, self.reasoning_store, request_messages
|
||||
body,
|
||||
original_model,
|
||||
self.reasoning_store,
|
||||
request_messages,
|
||||
cache_namespace,
|
||||
)
|
||||
except (json.JSONDecodeError, UnicodeDecodeError) as exc:
|
||||
LOG.warning("failed to rewrite upstream JSON response: %s", exc)
|
||||
log_usage_from_body(body)
|
||||
|
||||
if self.config.verbose:
|
||||
log_bytes("cursor response body", body)
|
||||
|
|
@ -319,6 +411,7 @@ class DeepSeekProxyHandler(BaseHTTPRequestHandler):
|
|||
response: Any,
|
||||
original_model: str,
|
||||
request_messages: list[dict[str, Any]],
|
||||
cache_namespace: str,
|
||||
) -> None:
|
||||
self.send_response(getattr(response, "status", 200))
|
||||
self._send_cors_headers()
|
||||
|
|
@ -334,7 +427,7 @@ class DeepSeekProxyHandler(BaseHTTPRequestHandler):
|
|||
if self.config.cursor_display_reasoning
|
||||
else None
|
||||
)
|
||||
scope = conversation_scope(request_messages)
|
||||
scope = conversation_scope(request_messages, cache_namespace)
|
||||
finalized = False
|
||||
while True:
|
||||
line = response.readline()
|
||||
|
|
@ -388,6 +481,10 @@ class DeepSeekProxyHandler(BaseHTTPRequestHandler):
|
|||
|
||||
if isinstance(chunk, dict):
|
||||
accumulator.ingest_chunk(chunk)
|
||||
stored = accumulator.store_finished_reasoning(self.reasoning_store, scope)
|
||||
if stored:
|
||||
LOG.info("stored %s streaming reasoning cache key(s)", stored)
|
||||
log_usage(chunk.get("usage"))
|
||||
if display_adapter is not None:
|
||||
display_adapter.rewrite_chunk(chunk)
|
||||
if "model" in chunk:
|
||||
|
|
@ -421,7 +518,7 @@ def build_arg_parser() -> argparse.ArgumentParser:
|
|||
)
|
||||
parser.add_argument(
|
||||
"--model",
|
||||
help="Upstream DeepSeek model, default from config, DEEPSEEK_MODEL, or deepseek-v4-pro",
|
||||
help="Fallback DeepSeek model when the request has no model, default from config, DEEPSEEK_MODEL, or deepseek-v4-pro",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--base-url",
|
||||
|
|
@ -450,6 +547,19 @@ def build_arg_parser() -> argparse.ArgumentParser:
|
|||
action="store_true",
|
||||
help="Do not mirror reasoning_content into Cursor-visible <think> content",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--missing-reasoning-strategy",
|
||||
choices=["reject", "placeholder"],
|
||||
help=(
|
||||
"What to do when required reasoning_content is missing: reject "
|
||||
"(safe default) or placeholder (unsafe compatibility fallback)"
|
||||
),
|
||||
)
|
||||
parser.add_argument(
|
||||
"--clear-reasoning-cache",
|
||||
action="store_true",
|
||||
help="Clear the local reasoning_content SQLite cache and exit",
|
||||
)
|
||||
return parser
|
||||
|
||||
|
||||
|
|
@ -474,6 +584,101 @@ def log_bytes(label: str, body: bytes) -> None:
|
|||
log_json(label, payload)
|
||||
|
||||
|
||||
def log_usage_from_body(body: bytes) -> None:
|
||||
try:
|
||||
payload = json.loads(body.decode("utf-8"))
|
||||
except (json.JSONDecodeError, UnicodeDecodeError):
|
||||
return
|
||||
if isinstance(payload, dict):
|
||||
log_usage(payload.get("usage"))
|
||||
|
||||
|
||||
def log_usage(usage: Any) -> None:
|
||||
if not isinstance(usage, dict):
|
||||
return
|
||||
summary = compact_usage_stats(usage)
|
||||
if summary is None:
|
||||
return
|
||||
LOG.info("deepseek usage: %s", summary)
|
||||
|
||||
|
||||
def compact_request_stats(payload: dict[str, Any]) -> str:
|
||||
messages = payload.get("messages")
|
||||
if not isinstance(messages, list):
|
||||
messages = []
|
||||
tools = payload.get("tools")
|
||||
reasoning_count = 0
|
||||
reasoning_chars = 0
|
||||
for message in messages:
|
||||
if not isinstance(message, dict) or message.get("role") != "assistant":
|
||||
continue
|
||||
reasoning = message.get("reasoning_content")
|
||||
if isinstance(reasoning, str):
|
||||
reasoning_count += 1
|
||||
reasoning_chars += len(reasoning)
|
||||
rounds = sum(
|
||||
1
|
||||
for message in messages
|
||||
if isinstance(message, dict) and message.get("role") == "user"
|
||||
)
|
||||
return (
|
||||
f"model={payload.get('model')} stream={int(bool(payload.get('stream')))} "
|
||||
f"rounds={rounds} msgs={len(messages)} "
|
||||
f"tools={len(tools) if isinstance(tools, list) else 0} "
|
||||
f"reasoning={reasoning_count}/{reasoning_chars}ch"
|
||||
)
|
||||
|
||||
|
||||
def compact_usage_stats(usage: dict[str, Any]) -> str | None:
|
||||
prompt_tokens = usage.get("prompt_tokens")
|
||||
completion_tokens = usage.get("completion_tokens")
|
||||
total_tokens = usage.get("total_tokens")
|
||||
hit_tokens = usage.get("prompt_cache_hit_tokens")
|
||||
miss_tokens = usage.get("prompt_cache_miss_tokens")
|
||||
details = usage.get("completion_tokens_details")
|
||||
reasoning_tokens = None
|
||||
if isinstance(details, dict):
|
||||
reasoning_tokens = details.get("reasoning_tokens")
|
||||
|
||||
if all(
|
||||
value is None
|
||||
for value in (
|
||||
prompt_tokens,
|
||||
completion_tokens,
|
||||
total_tokens,
|
||||
hit_tokens,
|
||||
miss_tokens,
|
||||
reasoning_tokens,
|
||||
)
|
||||
):
|
||||
return None
|
||||
|
||||
cache_summary = "cache=?"
|
||||
if hit_tokens is not None or miss_tokens is not None:
|
||||
hit = int_or_zero(hit_tokens)
|
||||
miss = int_or_zero(miss_tokens)
|
||||
cache_total = hit + miss
|
||||
if cache_total:
|
||||
cache_summary = f"cache={hit}/{miss} hit={hit / cache_total:.1%}"
|
||||
else:
|
||||
cache_summary = f"cache={hit}/{miss}"
|
||||
|
||||
return (
|
||||
f"prompt={prompt_tokens if prompt_tokens is not None else '?'} "
|
||||
f"completion={completion_tokens if completion_tokens is not None else '?'} "
|
||||
f"total={total_tokens if total_tokens is not None else '?'} "
|
||||
f"{cache_summary} "
|
||||
f"reasoning={reasoning_tokens if reasoning_tokens is not None else '?'}"
|
||||
)
|
||||
|
||||
|
||||
def int_or_zero(value: Any) -> int:
|
||||
try:
|
||||
return int(value or 0)
|
||||
except (TypeError, ValueError):
|
||||
return 0
|
||||
|
||||
|
||||
def sse_data(payload: dict[str, Any]) -> bytes:
|
||||
return (
|
||||
b"data: "
|
||||
|
|
@ -509,6 +714,16 @@ def read_response_body(response: Any) -> bytes:
|
|||
return body
|
||||
|
||||
|
||||
def warn_if_insecure_upstream(url: str) -> None:
|
||||
parsed = urlparse(url)
|
||||
if parsed.scheme != "http":
|
||||
return
|
||||
host = parsed.hostname or ""
|
||||
if host in {"127.0.0.1", "localhost", "::1"}:
|
||||
return
|
||||
LOG.warning("upstream base_url uses plain HTTP; bearer tokens may be exposed")
|
||||
|
||||
|
||||
def main(argv: list[str] | None = None) -> int:
|
||||
logging.basicConfig(
|
||||
level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s"
|
||||
|
|
@ -536,27 +751,51 @@ def main(argv: list[str] | None = None) -> int:
|
|||
updates["verbose"] = True
|
||||
if args.no_cursor_display_reasoning:
|
||||
updates["cursor_display_reasoning"] = False
|
||||
if args.missing_reasoning_strategy:
|
||||
updates["missing_reasoning_strategy"] = args.missing_reasoning_strategy
|
||||
if updates:
|
||||
config = replace(config, **updates)
|
||||
|
||||
store = ReasoningStore(config.reasoning_content_path)
|
||||
warn_if_insecure_upstream(config.upstream_base_url)
|
||||
store = ReasoningStore(
|
||||
config.reasoning_content_path,
|
||||
max_age_seconds=config.reasoning_cache_max_age_seconds,
|
||||
max_rows=config.reasoning_cache_max_rows,
|
||||
)
|
||||
if args.clear_reasoning_cache:
|
||||
deleted = store.clear()
|
||||
LOG.info("cleared %s reasoning cache row(s)", deleted)
|
||||
store.close()
|
||||
return 0
|
||||
server = DeepSeekProxyServer((config.host, config.port), DeepSeekProxyHandler)
|
||||
server.config = config
|
||||
server.reasoning_store = store
|
||||
|
||||
LOG.info("listening on http://%s:%s/v1", config.host, config.port)
|
||||
LOG.info(
|
||||
"forwarding to %s/chat/completions as %s",
|
||||
"forwarding to %s/chat/completions default_model=%s",
|
||||
config.upstream_base_url,
|
||||
config.upstream_model,
|
||||
)
|
||||
LOG.info(
|
||||
"thinking=%s reasoning_effort=%s cursor_display_reasoning=%s reasoning_content_path=%s",
|
||||
(
|
||||
"thinking=%s reasoning_effort=%s cursor_display_reasoning=%s "
|
||||
"missing_reasoning_strategy=%s reasoning_content_path=%s"
|
||||
),
|
||||
config.thinking,
|
||||
config.reasoning_effort,
|
||||
config.cursor_display_reasoning,
|
||||
config.missing_reasoning_strategy,
|
||||
config.reasoning_content_path,
|
||||
)
|
||||
if config.missing_reasoning_strategy == "placeholder":
|
||||
LOG.warning(
|
||||
(
|
||||
"missing_reasoning_strategy=placeholder is not DeepSeek-compliant; "
|
||||
"use only to test old Cursor histories whose original reasoning "
|
||||
"cannot be recovered"
|
||||
)
|
||||
)
|
||||
if config.verbose:
|
||||
LOG.info("logging mode=verbose metadata=detailed bodies=true")
|
||||
LOG.warning(
|
||||
|
|
|
|||
|
|
@ -16,6 +16,7 @@ class StreamingChoice:
|
|||
role: str = "assistant"
|
||||
content: str = ""
|
||||
reasoning_content: str = ""
|
||||
has_reasoning_content: bool = False
|
||||
tool_calls: list[dict[str, Any]] = field(default_factory=list)
|
||||
finish_reason: str | None = None
|
||||
|
||||
|
|
@ -24,7 +25,7 @@ class StreamingChoice:
|
|||
"role": self.role,
|
||||
"content": self.content,
|
||||
}
|
||||
if self.reasoning_content:
|
||||
if self.has_reasoning_content:
|
||||
message["reasoning_content"] = self.reasoning_content
|
||||
if self.tool_calls:
|
||||
message["tool_calls"] = self.tool_calls
|
||||
|
|
@ -34,6 +35,7 @@ class StreamingChoice:
|
|||
class StreamAccumulator:
|
||||
def __init__(self) -> None:
|
||||
self.choices: dict[int, StreamingChoice] = {}
|
||||
self._stored_choices: set[int] = set()
|
||||
|
||||
def ingest_chunk(self, chunk: dict[str, Any]) -> None:
|
||||
choices = chunk.get("choices")
|
||||
|
|
@ -63,14 +65,22 @@ class StreamAccumulator:
|
|||
|
||||
reasoning_content = delta.get("reasoning_content")
|
||||
if isinstance(reasoning_content, str):
|
||||
choice.has_reasoning_content = True
|
||||
choice.reasoning_content += reasoning_content
|
||||
|
||||
self._merge_tool_call_deltas(choice, delta.get("tool_calls"))
|
||||
|
||||
def store_reasoning(self, store: ReasoningStore, scope: str) -> int:
|
||||
stored = 0
|
||||
for choice in self.choices.values():
|
||||
stored += store.store_assistant_message(choice.to_message(), scope)
|
||||
for index, choice in self.choices.items():
|
||||
stored += self._store_choice(index, choice, store, scope)
|
||||
return stored
|
||||
|
||||
def store_finished_reasoning(self, store: ReasoningStore, scope: str) -> int:
|
||||
stored = 0
|
||||
for index, choice in self.choices.items():
|
||||
if choice.finish_reason is not None:
|
||||
stored += self._store_choice(index, choice, store, scope)
|
||||
return stored
|
||||
|
||||
def messages(self) -> list[dict[str, Any]]:
|
||||
|
|
@ -115,6 +125,20 @@ class StreamAccumulator:
|
|||
function_delta["arguments"]
|
||||
)
|
||||
|
||||
def _store_choice(
|
||||
self,
|
||||
index: int,
|
||||
choice: StreamingChoice,
|
||||
store: ReasoningStore,
|
||||
scope: str,
|
||||
) -> int:
|
||||
if index in self._stored_choices:
|
||||
return 0
|
||||
stored = store.store_assistant_message(choice.to_message(), scope)
|
||||
if stored:
|
||||
self._stored_choices.add(index)
|
||||
return stored
|
||||
|
||||
|
||||
class CursorReasoningDisplayAdapter:
|
||||
"""Mirror reasoning_content into content for Cursor's visible thinking UI path."""
|
||||
|
|
|
|||
|
|
@ -1,6 +1,7 @@
|
|||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
import hashlib
|
||||
import json
|
||||
import re
|
||||
from typing import Any
|
||||
|
|
@ -66,14 +67,23 @@ CURSOR_THINKING_BLOCK_RE = re.compile(
|
|||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
PLACEHOLDER_REASONING_CONTENT = (
|
||||
"[deepseek-cursor-proxy placeholder reasoning_content: original DeepSeek "
|
||||
"reasoning_content was missing from Cursor history and unavailable in the "
|
||||
"local cache. This is an opt-in compatibility fallback, not the original "
|
||||
"model reasoning.]"
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class PreparedRequest:
|
||||
payload: dict[str, Any]
|
||||
original_model: str
|
||||
upstream_model: str
|
||||
cache_namespace: str
|
||||
patched_reasoning_messages: int
|
||||
fallback_reasoning_messages: int
|
||||
placeholder_reasoning_messages: int
|
||||
missing_reasoning_messages: int
|
||||
|
||||
|
||||
def normalize_reasoning_effort(value: Any) -> str:
|
||||
|
|
@ -158,26 +168,30 @@ def legacy_function_to_tool(function: Any) -> dict[str, Any]:
|
|||
|
||||
def convert_function_call(function_call: Any) -> Any:
|
||||
if isinstance(function_call, str):
|
||||
if function_call in {"auto", "none"}:
|
||||
if function_call in {"auto", "none", "required"}:
|
||||
return function_call
|
||||
if function_call == "required":
|
||||
return "auto"
|
||||
return None
|
||||
if isinstance(function_call, dict) and function_call.get("name"):
|
||||
return "auto"
|
||||
return {
|
||||
"type": "function",
|
||||
"function": {"name": str(function_call["name"])},
|
||||
}
|
||||
return None
|
||||
|
||||
|
||||
def normalize_tool_choice(tool_choice: Any) -> Any:
|
||||
if isinstance(tool_choice, str):
|
||||
if tool_choice in {"auto", "none"}:
|
||||
if tool_choice in {"auto", "none", "required"}:
|
||||
return tool_choice
|
||||
if tool_choice == "required":
|
||||
return "auto"
|
||||
return None
|
||||
if isinstance(tool_choice, dict):
|
||||
if tool_choice.get("type") == "function":
|
||||
return "auto"
|
||||
function = tool_choice.get("function")
|
||||
if isinstance(function, dict) and function.get("name"):
|
||||
return {
|
||||
"type": "function",
|
||||
"function": {"name": str(function["name"])},
|
||||
}
|
||||
return tool_choice
|
||||
return tool_choice
|
||||
|
||||
|
|
@ -186,7 +200,11 @@ def normalize_message(
|
|||
message: Any,
|
||||
store: ReasoningStore | None,
|
||||
prior_messages: list[dict[str, Any]],
|
||||
) -> tuple[dict[str, Any], bool, bool]:
|
||||
cache_namespace: str,
|
||||
repair_reasoning: bool,
|
||||
keep_reasoning: bool,
|
||||
missing_reasoning_strategy: str,
|
||||
) -> tuple[dict[str, Any], bool, bool, bool]:
|
||||
if not isinstance(message, dict):
|
||||
message = {"role": "user", "content": str(message)}
|
||||
normalized = {key: value for key, value in message.items() if key in MESSAGE_FIELDS}
|
||||
|
|
@ -210,49 +228,72 @@ def normalize_message(
|
|||
]
|
||||
|
||||
patched = False
|
||||
fallback = False
|
||||
placeholder = False
|
||||
missing = False
|
||||
if normalized["role"] == "assistant":
|
||||
reasoning = normalized.get("reasoning_content")
|
||||
if not isinstance(reasoning, str) or not reasoning:
|
||||
if not keep_reasoning:
|
||||
normalized.pop("reasoning_content", None)
|
||||
if store is not None:
|
||||
restored = store.lookup_for_message(
|
||||
normalized, conversation_scope(prior_messages)
|
||||
elif repair_reasoning:
|
||||
reasoning = normalized.get("reasoning_content")
|
||||
if not isinstance(reasoning, str):
|
||||
normalized.pop("reasoning_content", None)
|
||||
needs_reasoning = assistant_needs_reasoning_for_tool_context(
|
||||
normalized, prior_messages
|
||||
)
|
||||
if restored:
|
||||
normalized["reasoning_content"] = restored
|
||||
patched = True
|
||||
if not patched and assistant_needs_reasoning_for_tool_context(
|
||||
normalized, prior_messages
|
||||
):
|
||||
normalized["reasoning_content"] = fallback_reasoning_content(normalized)
|
||||
fallback = True
|
||||
if needs_reasoning and store is not None:
|
||||
restored = store.lookup_for_message(
|
||||
normalized,
|
||||
conversation_scope(prior_messages, cache_namespace),
|
||||
)
|
||||
if restored is not None:
|
||||
normalized["reasoning_content"] = restored
|
||||
patched = True
|
||||
if needs_reasoning and not patched:
|
||||
if missing_reasoning_strategy == "placeholder":
|
||||
normalized["reasoning_content"] = PLACEHOLDER_REASONING_CONTENT
|
||||
placeholder = True
|
||||
else:
|
||||
missing = True
|
||||
|
||||
allowed_fields = ROLE_MESSAGE_FIELDS.get(str(normalized["role"]), MESSAGE_FIELDS)
|
||||
normalized = {
|
||||
key: value for key, value in normalized.items() if key in allowed_fields
|
||||
}
|
||||
return normalized, patched, fallback
|
||||
return normalized, patched, placeholder, missing
|
||||
|
||||
|
||||
def normalize_messages(
|
||||
messages: Any, store: ReasoningStore | None
|
||||
) -> tuple[list[dict[str, Any]], int, int]:
|
||||
messages: Any,
|
||||
store: ReasoningStore | None,
|
||||
cache_namespace: str,
|
||||
repair_reasoning: bool,
|
||||
keep_reasoning: bool,
|
||||
missing_reasoning_strategy: str,
|
||||
) -> tuple[list[dict[str, Any]], int, int, int]:
|
||||
if not isinstance(messages, list):
|
||||
return [], 0, 0
|
||||
return [], 0, 0, 0
|
||||
normalized_messages: list[dict[str, Any]] = []
|
||||
patched_count = 0
|
||||
fallback_count = 0
|
||||
placeholder_count = 0
|
||||
missing_count = 0
|
||||
for message in messages:
|
||||
normalized, patched, fallback = normalize_message(
|
||||
message, store, normalized_messages
|
||||
normalized, patched, placeholder, missing = normalize_message(
|
||||
message,
|
||||
store,
|
||||
normalized_messages,
|
||||
cache_namespace,
|
||||
repair_reasoning,
|
||||
keep_reasoning,
|
||||
missing_reasoning_strategy,
|
||||
)
|
||||
normalized_messages.append(normalized)
|
||||
if patched:
|
||||
patched_count += 1
|
||||
if fallback:
|
||||
fallback_count += 1
|
||||
return normalized_messages, patched_count, fallback_count
|
||||
if placeholder:
|
||||
placeholder_count += 1
|
||||
if missing:
|
||||
missing_count += 1
|
||||
return normalized_messages, patched_count, placeholder_count, missing_count
|
||||
|
||||
|
||||
def assistant_needs_reasoning_for_tool_context(
|
||||
|
|
@ -270,22 +311,40 @@ def assistant_needs_reasoning_for_tool_context(
|
|||
return False
|
||||
|
||||
|
||||
def fallback_reasoning_content(message: dict[str, Any]) -> str:
|
||||
if message.get("tool_calls"):
|
||||
return "Compatibility placeholder: Cursor omitted DeepSeek reasoning_content for this tool-call turn."
|
||||
return "Compatibility placeholder: Cursor omitted DeepSeek reasoning_content for this tool-result turn."
|
||||
|
||||
|
||||
def upstream_model_for(original_model: str, config: ProxyConfig) -> str:
|
||||
if config.allow_model_passthrough and original_model.startswith("deepseek-"):
|
||||
if original_model.startswith("deepseek-"):
|
||||
return original_model
|
||||
return config.upstream_model
|
||||
|
||||
|
||||
def reasoning_cache_namespace(
|
||||
config: ProxyConfig,
|
||||
upstream_model: str,
|
||||
thinking: Any,
|
||||
reasoning_effort: Any,
|
||||
authorization: str | None = None,
|
||||
) -> str:
|
||||
auth_hash = ""
|
||||
if authorization:
|
||||
auth_hash = hashlib.sha256(authorization.encode("utf-8")).hexdigest()
|
||||
payload = {
|
||||
"base_url": config.upstream_base_url,
|
||||
"model": upstream_model,
|
||||
"thinking": thinking,
|
||||
"reasoning_effort": reasoning_effort,
|
||||
"authorization_hash": auth_hash,
|
||||
}
|
||||
canonical = json.dumps(
|
||||
payload, ensure_ascii=False, sort_keys=True, separators=(",", ":")
|
||||
)
|
||||
return hashlib.sha256(canonical.encode("utf-8")).hexdigest()
|
||||
|
||||
|
||||
def prepare_upstream_request(
|
||||
payload: dict[str, Any],
|
||||
config: ProxyConfig,
|
||||
store: ReasoningStore | None,
|
||||
authorization: str | None = None,
|
||||
) -> PreparedRequest:
|
||||
original_model = str(payload.get("model") or config.upstream_model)
|
||||
upstream_model = upstream_model_for(original_model, config)
|
||||
|
|
@ -297,10 +356,14 @@ def prepare_upstream_request(
|
|||
prepared["max_tokens"] = payload["max_completion_tokens"]
|
||||
|
||||
prepared["model"] = upstream_model
|
||||
messages, patched_count, fallback_count = normalize_messages(
|
||||
payload.get("messages"), store
|
||||
)
|
||||
prepared["messages"] = messages
|
||||
if prepared.get("stream"):
|
||||
stream_options = prepared.get("stream_options")
|
||||
if not isinstance(stream_options, dict):
|
||||
stream_options = {}
|
||||
else:
|
||||
stream_options = dict(stream_options)
|
||||
stream_options["include_usage"] = True
|
||||
prepared["stream_options"] = stream_options
|
||||
|
||||
if "tools" in prepared and isinstance(prepared["tools"], list):
|
||||
prepared["tools"] = [normalize_tool(tool) for tool in prepared["tools"]]
|
||||
|
|
@ -325,17 +388,39 @@ def prepare_upstream_request(
|
|||
|
||||
thinking = prepared.get("thinking")
|
||||
thinking_enabled = isinstance(thinking, dict) and thinking.get("type") == "enabled"
|
||||
thinking_disabled = (
|
||||
isinstance(thinking, dict) and thinking.get("type") == "disabled"
|
||||
)
|
||||
if thinking_enabled:
|
||||
prepared["reasoning_effort"] = normalize_reasoning_effort(
|
||||
prepared.get("reasoning_effort") or config.reasoning_effort
|
||||
)
|
||||
|
||||
cache_namespace = reasoning_cache_namespace(
|
||||
config,
|
||||
upstream_model,
|
||||
prepared.get("thinking"),
|
||||
prepared.get("reasoning_effort"),
|
||||
authorization,
|
||||
)
|
||||
messages, patched_count, placeholder_count, missing_count = normalize_messages(
|
||||
payload.get("messages"),
|
||||
store,
|
||||
cache_namespace,
|
||||
repair_reasoning=thinking_enabled,
|
||||
keep_reasoning=not thinking_disabled,
|
||||
missing_reasoning_strategy=config.missing_reasoning_strategy,
|
||||
)
|
||||
prepared["messages"] = messages
|
||||
|
||||
return PreparedRequest(
|
||||
payload=prepared,
|
||||
original_model=original_model,
|
||||
upstream_model=upstream_model,
|
||||
cache_namespace=cache_namespace,
|
||||
patched_reasoning_messages=patched_count,
|
||||
fallback_reasoning_messages=fallback_count,
|
||||
placeholder_reasoning_messages=placeholder_count,
|
||||
missing_reasoning_messages=missing_count,
|
||||
)
|
||||
|
||||
|
||||
|
|
@ -343,6 +428,7 @@ def record_response_reasoning(
|
|||
response_payload: dict[str, Any],
|
||||
store: ReasoningStore | None,
|
||||
request_messages: list[dict[str, Any]],
|
||||
cache_namespace: str = "",
|
||||
) -> int:
|
||||
if store is None:
|
||||
return 0
|
||||
|
|
@ -350,7 +436,7 @@ def record_response_reasoning(
|
|||
choices = response_payload.get("choices")
|
||||
if not isinstance(choices, list):
|
||||
return stored
|
||||
scope = conversation_scope(request_messages)
|
||||
scope = conversation_scope(request_messages, cache_namespace)
|
||||
for choice in choices:
|
||||
if not isinstance(choice, dict):
|
||||
continue
|
||||
|
|
@ -365,10 +451,13 @@ def rewrite_response_body(
|
|||
original_model: str,
|
||||
store: ReasoningStore | None,
|
||||
request_messages: list[dict[str, Any]],
|
||||
cache_namespace: str = "",
|
||||
) -> bytes:
|
||||
response_payload = json.loads(body.decode("utf-8"))
|
||||
if isinstance(response_payload, dict):
|
||||
record_response_reasoning(response_payload, store, request_messages)
|
||||
record_response_reasoning(
|
||||
response_payload, store, request_messages, cache_namespace
|
||||
)
|
||||
if "model" in response_payload:
|
||||
response_payload["model"] = original_model
|
||||
return json.dumps(
|
||||
|
|
|
|||
|
|
@ -140,12 +140,30 @@ class ConfigTests(unittest.TestCase):
|
|||
env={
|
||||
"PROXY_VERBOSE": "true",
|
||||
"PROXY_NGROK": "yes",
|
||||
"PROXY_CORS": "true",
|
||||
"PROXY_MAX_REQUEST_BODY_BYTES": "1234",
|
||||
"MISSING_REASONING_STRATEGY": "placeholder",
|
||||
"REASONING_CACHE_MAX_AGE_SECONDS": "60",
|
||||
"REASONING_CACHE_MAX_ROWS": "50",
|
||||
},
|
||||
config_path=Path("/does/not/exist"),
|
||||
)
|
||||
|
||||
self.assertTrue(config.verbose)
|
||||
self.assertTrue(config.ngrok)
|
||||
self.assertTrue(config.cors)
|
||||
self.assertEqual(config.max_request_body_bytes, 1234)
|
||||
self.assertEqual(config.missing_reasoning_strategy, "placeholder")
|
||||
self.assertEqual(config.reasoning_cache_max_age_seconds, 60)
|
||||
self.assertEqual(config.reasoning_cache_max_rows, 50)
|
||||
|
||||
def test_invalid_missing_reasoning_strategy_defaults_to_reject(self) -> None:
|
||||
config = ProxyConfig.from_file(
|
||||
env={"MISSING_REASONING_STRATEGY": "invent"},
|
||||
config_path=Path("/does/not/exist"),
|
||||
)
|
||||
|
||||
self.assertEqual(config.missing_reasoning_strategy, "reject")
|
||||
|
||||
def test_cursor_reasoning_display_can_be_disabled_from_config(self) -> None:
|
||||
with TemporaryDirectory() as temp_dir:
|
||||
|
|
|
|||
|
|
@ -16,6 +16,10 @@ from deepseek_cursor_proxy.reasoning_store import (
|
|||
message_signature,
|
||||
)
|
||||
from deepseek_cursor_proxy.server import DeepSeekProxyHandler, DeepSeekProxyServer
|
||||
from deepseek_cursor_proxy.transform import (
|
||||
PLACEHOLDER_REASONING_CONTENT,
|
||||
reasoning_cache_namespace,
|
||||
)
|
||||
|
||||
|
||||
TOOL_REASONING = "I need the current date before answering."
|
||||
|
|
@ -253,6 +257,85 @@ class ReasoningStreamingDeepSeekHandler(BaseHTTPRequestHandler):
|
|||
self.wfile.flush()
|
||||
|
||||
|
||||
class ToolCallStreamingBeforeDoneDeepSeekHandler(BaseHTTPRequestHandler):
|
||||
requests: list[dict] = []
|
||||
|
||||
def log_message(self, fmt: str, *args: object) -> None:
|
||||
return
|
||||
|
||||
def do_POST(self) -> None:
|
||||
length = int(self.headers.get("Content-Length") or 0)
|
||||
payload = json.loads(self.rfile.read(length).decode("utf-8"))
|
||||
self.__class__.requests.append(payload)
|
||||
|
||||
if payload.get("stream"):
|
||||
self.send_response(200)
|
||||
self.send_header("Content-Type", "text/event-stream")
|
||||
self.end_headers()
|
||||
chunks = [
|
||||
{
|
||||
"id": "chatcmpl-stream-tool",
|
||||
"object": "chat.completion.chunk",
|
||||
"created": 1,
|
||||
"model": "deepseek-v4-pro",
|
||||
"choices": [
|
||||
{
|
||||
"index": 0,
|
||||
"delta": {
|
||||
"role": "assistant",
|
||||
"reasoning_content": "Streamed tool reasoning.",
|
||||
"tool_calls": [
|
||||
{
|
||||
"index": 0,
|
||||
"id": "call_stream_tool",
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "lookup",
|
||||
"arguments": "{}",
|
||||
},
|
||||
}
|
||||
],
|
||||
},
|
||||
"finish_reason": None,
|
||||
}
|
||||
],
|
||||
},
|
||||
{
|
||||
"id": "chatcmpl-stream-tool",
|
||||
"object": "chat.completion.chunk",
|
||||
"created": 1,
|
||||
"model": "deepseek-v4-pro",
|
||||
"choices": [
|
||||
{"index": 0, "delta": {}, "finish_reason": "tool_calls"}
|
||||
],
|
||||
},
|
||||
]
|
||||
for chunk in chunks:
|
||||
self.wfile.write(f"data: {json.dumps(chunk)}\n\n".encode("utf-8"))
|
||||
self.wfile.flush()
|
||||
time.sleep(1)
|
||||
self.wfile.write(b"data: [DONE]\n\n")
|
||||
self.wfile.flush()
|
||||
return
|
||||
|
||||
messages = payload.get("messages", [])
|
||||
if (
|
||||
len(messages) >= 2
|
||||
and messages[1].get("reasoning_content") == "Streamed tool reasoning."
|
||||
):
|
||||
self._send_json(200, plain_response("stream follow-up accepted"))
|
||||
return
|
||||
self._send_json(400, {"error": {"message": "missing streamed reasoning"}})
|
||||
|
||||
def _send_json(self, status: int, payload: dict) -> None:
|
||||
body = json.dumps(payload).encode("utf-8")
|
||||
self.send_response(status)
|
||||
self.send_header("Content-Type", "application/json")
|
||||
self.send_header("Content-Length", str(len(body)))
|
||||
self.end_headers()
|
||||
self.wfile.write(body)
|
||||
|
||||
|
||||
def tool_call_response() -> dict:
|
||||
return {
|
||||
"id": "chatcmpl-tool",
|
||||
|
|
@ -277,6 +360,14 @@ def tool_call_response() -> dict:
|
|||
},
|
||||
}
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 20,
|
||||
"completion_tokens": 5,
|
||||
"total_tokens": 25,
|
||||
"prompt_cache_hit_tokens": 12,
|
||||
"prompt_cache_miss_tokens": 8,
|
||||
"completion_tokens_details": {"reasoning_tokens": 3},
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
|
|
@ -474,6 +565,14 @@ class ProxyEndToEndTests(unittest.TestCase):
|
|||
output = "\n".join(captured.output)
|
||||
self.assertEqual(status, 200)
|
||||
self.assertIn("cursor request: model='deepseek-v4-pro'", output)
|
||||
self.assertIn(
|
||||
"deepseek send: model=deepseek-v4-pro stream=0 rounds=1 msgs=1 tools=1 reasoning=0/0ch",
|
||||
output,
|
||||
)
|
||||
self.assertIn(
|
||||
"deepseek usage: prompt=20 completion=5 total=25 cache=12/8 hit=60.0% reasoning=3",
|
||||
output,
|
||||
)
|
||||
self.assertIn("request complete status=200", output)
|
||||
self.assertNotIn("What is tomorrow's date?", output)
|
||||
self.assertNotIn("sk-from-cursor", output)
|
||||
|
|
@ -511,17 +610,55 @@ class ProxyEndToEndTests(unittest.TestCase):
|
|||
self.assertEqual(caught.exception.code, 401)
|
||||
self.assertEqual(FakeDeepSeekHandler.requests, [])
|
||||
|
||||
def test_proxy_adds_fallback_reasoning_for_uncached_cursor_tool_history(
|
||||
def test_proxy_rejects_oversized_request_body(self) -> None:
|
||||
self.proxy.server.config = replace(
|
||||
self.proxy.server.config, max_request_body_bytes=10
|
||||
)
|
||||
|
||||
status, payload = post_json(
|
||||
f"{self.proxy.url}/v1/chat/completions",
|
||||
first_cursor_request(),
|
||||
)
|
||||
|
||||
self.assertEqual(status, 413)
|
||||
self.assertIn("too large", payload["error"]["message"])
|
||||
self.assertEqual(FakeDeepSeekHandler.requests, [])
|
||||
|
||||
def test_proxy_rejects_uncached_cursor_tool_history_without_placeholder(
|
||||
self,
|
||||
) -> None:
|
||||
status, payload = post_json(
|
||||
f"{self.proxy.url}/v1/chat/completions",
|
||||
second_cursor_request(include_reasoning=False),
|
||||
)
|
||||
|
||||
self.assertEqual(status, 409)
|
||||
self.assertEqual(payload["error"]["missing_reasoning_messages"], 1)
|
||||
self.assertIn("1 assistant message", payload["error"]["message"])
|
||||
self.assertIn(
|
||||
"not sent upstream",
|
||||
payload["error"]["diagnostic_placeholder"],
|
||||
)
|
||||
self.assertEqual(FakeDeepSeekHandler.requests, [])
|
||||
|
||||
def test_proxy_can_forward_placeholder_for_uncached_cursor_tool_history(
|
||||
self,
|
||||
) -> None:
|
||||
self.proxy.server.config = replace(
|
||||
self.proxy.server.config,
|
||||
missing_reasoning_strategy="placeholder",
|
||||
)
|
||||
|
||||
status, _ = post_json(
|
||||
f"{self.proxy.url}/v1/chat/completions",
|
||||
second_cursor_request(include_reasoning=False),
|
||||
)
|
||||
|
||||
self.assertEqual(status, 200)
|
||||
upstream_messages = FakeDeepSeekHandler.requests[0]["messages"]
|
||||
self.assertIn("reasoning_content", upstream_messages[1])
|
||||
self.assertEqual(
|
||||
FakeDeepSeekHandler.requests[0]["messages"][1]["reasoning_content"],
|
||||
PLACEHOLDER_REASONING_CONTENT,
|
||||
)
|
||||
|
||||
|
||||
class InterleavedConversationTests(unittest.TestCase):
|
||||
|
|
@ -737,10 +874,17 @@ class ReasoningStreamingProxyTests(unittest.TestCase):
|
|||
"content": FINAL_CONTENT,
|
||||
"reasoning_content": "Need context.",
|
||||
}
|
||||
cache_namespace = reasoning_cache_namespace(
|
||||
self.proxy.server.config,
|
||||
"deepseek-v4-pro",
|
||||
{"type": "enabled"},
|
||||
"high",
|
||||
"Bearer sk-cursor-test",
|
||||
)
|
||||
self.assertEqual(
|
||||
self.store.get(
|
||||
"scope:"
|
||||
+ conversation_scope(request_messages)
|
||||
+ conversation_scope(request_messages, cache_namespace)
|
||||
+ ":signature:"
|
||||
+ message_signature(stored_message)
|
||||
),
|
||||
|
|
@ -748,6 +892,107 @@ class ReasoningStreamingProxyTests(unittest.TestCase):
|
|||
)
|
||||
|
||||
|
||||
class StreamingToolRaceProxyTests(unittest.TestCase):
|
||||
def setUp(self) -> None:
|
||||
ToolCallStreamingBeforeDoneDeepSeekHandler.requests = []
|
||||
self.upstream = ServerFixture(
|
||||
ThreadingHTTPServer(
|
||||
("127.0.0.1", 0), ToolCallStreamingBeforeDoneDeepSeekHandler
|
||||
)
|
||||
).start()
|
||||
self.store = ReasoningStore(":memory:")
|
||||
proxy = DeepSeekProxyServer(("127.0.0.1", 0), DeepSeekProxyHandler)
|
||||
proxy.config = ProxyConfig(
|
||||
upstream_base_url=self.upstream.url,
|
||||
upstream_model="deepseek-v4-pro",
|
||||
)
|
||||
proxy.reasoning_store = self.store
|
||||
self.proxy = ServerFixture(proxy).start()
|
||||
|
||||
def tearDown(self) -> None:
|
||||
self.proxy.close()
|
||||
self.upstream.close()
|
||||
self.store.close()
|
||||
|
||||
def test_streaming_tool_reasoning_is_available_before_done(self) -> None:
|
||||
request_messages = [{"role": "user", "content": "stream tool"}]
|
||||
request = Request(
|
||||
f"{self.proxy.url}/v1/chat/completions",
|
||||
data=json.dumps(
|
||||
{
|
||||
"model": "deepseek-v4-pro",
|
||||
"stream": True,
|
||||
"messages": request_messages,
|
||||
"tools": [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "lookup",
|
||||
"parameters": {"type": "object", "properties": {}},
|
||||
},
|
||||
}
|
||||
],
|
||||
}
|
||||
).encode("utf-8"),
|
||||
method="POST",
|
||||
headers={
|
||||
"Authorization": "Bearer sk-cursor-test",
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
)
|
||||
|
||||
with urlopen(request, timeout=3) as response:
|
||||
while True:
|
||||
line = response.readline().decode("utf-8")
|
||||
self.assertNotEqual(line, "")
|
||||
if '"finish_reason":"tool_calls"' in line:
|
||||
break
|
||||
|
||||
status, payload = post_json(
|
||||
f"{self.proxy.url}/v1/chat/completions",
|
||||
{
|
||||
"model": "deepseek-v4-pro",
|
||||
"messages": [
|
||||
*request_messages,
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": "",
|
||||
"tool_calls": [
|
||||
{
|
||||
"id": "call_stream_tool",
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "lookup",
|
||||
"arguments": "{}",
|
||||
},
|
||||
}
|
||||
],
|
||||
},
|
||||
{
|
||||
"role": "tool",
|
||||
"tool_call_id": "call_stream_tool",
|
||||
"content": "tool result",
|
||||
},
|
||||
],
|
||||
"tools": [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "lookup",
|
||||
"parameters": {"type": "object", "properties": {}},
|
||||
},
|
||||
}
|
||||
],
|
||||
},
|
||||
)
|
||||
response.read()
|
||||
|
||||
self.assertEqual(status, 200, payload)
|
||||
self.assertEqual(
|
||||
payload["choices"][0]["message"]["content"], "stream follow-up accepted"
|
||||
)
|
||||
|
||||
|
||||
def first_cursor_request() -> dict:
|
||||
return {
|
||||
"model": "deepseek-v4-pro",
|
||||
|
|
|
|||
|
|
@ -5,7 +5,7 @@ import stat
|
|||
from tempfile import TemporaryDirectory
|
||||
import unittest
|
||||
|
||||
from deepseek_cursor_proxy.reasoning_store import ReasoningStore
|
||||
from deepseek_cursor_proxy.reasoning_store import ReasoningStore, conversation_scope
|
||||
|
||||
|
||||
class ReasoningStoreTests(unittest.TestCase):
|
||||
|
|
@ -21,6 +21,50 @@ class ReasoningStoreTests(unittest.TestCase):
|
|||
self.assertTrue(reasoning_content_path.exists())
|
||||
self.assertEqual(stat.S_IMODE(reasoning_content_path.stat().st_mode), 0o600)
|
||||
|
||||
def test_store_prunes_to_max_rows_and_can_clear(self) -> None:
|
||||
store = ReasoningStore(":memory:", max_rows=2)
|
||||
try:
|
||||
store.put("a", "reasoning a", {"role": "assistant"})
|
||||
store.put("b", "reasoning b", {"role": "assistant"})
|
||||
store.put("c", "reasoning c", {"role": "assistant"})
|
||||
|
||||
self.assertIsNone(store.get("a"))
|
||||
self.assertEqual(store.get("b"), "reasoning b")
|
||||
self.assertEqual(store.get("c"), "reasoning c")
|
||||
self.assertEqual(store.clear(), 2)
|
||||
self.assertIsNone(store.get("b"))
|
||||
self.assertIsNone(store.get("c"))
|
||||
finally:
|
||||
store.close()
|
||||
|
||||
def test_empty_reasoning_content_is_stored_as_present_value(self) -> None:
|
||||
store = ReasoningStore(":memory:")
|
||||
try:
|
||||
scope = conversation_scope([{"role": "user", "content": "lookup"}])
|
||||
tool_call = {
|
||||
"id": "call_empty",
|
||||
"type": "function",
|
||||
"function": {"name": "lookup", "arguments": "{}"},
|
||||
}
|
||||
message = {
|
||||
"role": "assistant",
|
||||
"content": "",
|
||||
"reasoning_content": "",
|
||||
"tool_calls": [tool_call],
|
||||
}
|
||||
|
||||
self.assertGreater(store.store_assistant_message(message, scope), 0)
|
||||
self.assertEqual(store.get(f"scope:{scope}:tool_call:call_empty"), "")
|
||||
self.assertEqual(
|
||||
store.lookup_for_message(
|
||||
{"role": "assistant", "content": "", "tool_calls": [tool_call]},
|
||||
scope,
|
||||
),
|
||||
"",
|
||||
)
|
||||
finally:
|
||||
store.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
|
|
|
|||
|
|
@ -77,6 +77,84 @@ class StreamAccumulatorTests(unittest.TestCase):
|
|||
)
|
||||
store.close()
|
||||
|
||||
def test_stores_reasoning_when_choice_finishes_before_done(self) -> None:
|
||||
store = ReasoningStore(":memory:")
|
||||
accumulator = StreamAccumulator()
|
||||
accumulator.ingest_chunk(
|
||||
{
|
||||
"choices": [
|
||||
{
|
||||
"index": 0,
|
||||
"delta": {
|
||||
"role": "assistant",
|
||||
"reasoning_content": "Need a tool.",
|
||||
"tool_calls": [
|
||||
{
|
||||
"index": 0,
|
||||
"id": "call_stream",
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "lookup",
|
||||
"arguments": "{}",
|
||||
},
|
||||
}
|
||||
],
|
||||
},
|
||||
"finish_reason": "tool_calls",
|
||||
}
|
||||
]
|
||||
}
|
||||
)
|
||||
|
||||
scope = conversation_scope([{"role": "user", "content": "lookup"}])
|
||||
stored = accumulator.store_finished_reasoning(store, scope)
|
||||
|
||||
self.assertGreater(stored, 0)
|
||||
self.assertEqual(
|
||||
store.get(f"scope:{scope}:tool_call:call_stream"), "Need a tool."
|
||||
)
|
||||
self.assertEqual(accumulator.store_reasoning(store, scope), 0)
|
||||
store.close()
|
||||
|
||||
def test_stores_empty_reasoning_content_when_stream_field_is_present(
|
||||
self,
|
||||
) -> None:
|
||||
store = ReasoningStore(":memory:")
|
||||
accumulator = StreamAccumulator()
|
||||
accumulator.ingest_chunk(
|
||||
{
|
||||
"choices": [
|
||||
{
|
||||
"index": 0,
|
||||
"delta": {
|
||||
"role": "assistant",
|
||||
"reasoning_content": "",
|
||||
"tool_calls": [
|
||||
{
|
||||
"index": 0,
|
||||
"id": "call_empty",
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "lookup",
|
||||
"arguments": "{}",
|
||||
},
|
||||
}
|
||||
],
|
||||
},
|
||||
"finish_reason": "tool_calls",
|
||||
}
|
||||
]
|
||||
}
|
||||
)
|
||||
|
||||
scope = conversation_scope([{"role": "user", "content": "lookup"}])
|
||||
stored = accumulator.store_finished_reasoning(store, scope)
|
||||
|
||||
self.assertGreater(stored, 0)
|
||||
self.assertEqual(store.get(f"scope:{scope}:tool_call:call_empty"), "")
|
||||
self.assertEqual(accumulator.messages()[0]["reasoning_content"], "")
|
||||
store.close()
|
||||
|
||||
def test_returns_accumulated_messages_for_logging(self) -> None:
|
||||
accumulator = StreamAccumulator()
|
||||
accumulator.ingest_chunk(
|
||||
|
|
|
|||
|
|
@ -6,13 +6,28 @@ import unittest
|
|||
from deepseek_cursor_proxy.config import ProxyConfig
|
||||
from deepseek_cursor_proxy.reasoning_store import ReasoningStore, conversation_scope
|
||||
from deepseek_cursor_proxy.transform import (
|
||||
PLACEHOLDER_REASONING_CONTENT,
|
||||
extract_text_content,
|
||||
prepare_upstream_request,
|
||||
reasoning_cache_namespace,
|
||||
rewrite_response_body,
|
||||
strip_cursor_thinking_blocks,
|
||||
)
|
||||
|
||||
|
||||
DEFAULT_CONFIG = ProxyConfig()
|
||||
DEFAULT_CACHE_NAMESPACE = reasoning_cache_namespace(
|
||||
DEFAULT_CONFIG,
|
||||
"deepseek-v4-pro",
|
||||
{"type": "enabled"},
|
||||
"high",
|
||||
)
|
||||
|
||||
|
||||
def cache_scope(messages: list[dict]) -> str:
|
||||
return conversation_scope(messages, DEFAULT_CACHE_NAMESPACE)
|
||||
|
||||
|
||||
class TransformTests(unittest.TestCase):
|
||||
def setUp(self) -> None:
|
||||
self.store = ReasoningStore(":memory:")
|
||||
|
|
@ -75,19 +90,44 @@ class TransformTests(unittest.TestCase):
|
|||
prepared = prepare_upstream_request(payload, config, self.store)
|
||||
|
||||
self.assertEqual(prepared.original_model, "deepseek-v4-flash")
|
||||
self.assertEqual(prepared.upstream_model, "deepseek-v4-pro")
|
||||
self.assertEqual(prepared.payload["model"], "deepseek-v4-pro")
|
||||
self.assertEqual(prepared.upstream_model, "deepseek-v4-flash")
|
||||
self.assertEqual(prepared.payload["model"], "deepseek-v4-flash")
|
||||
self.assertEqual(prepared.payload["thinking"], {"type": "enabled"})
|
||||
self.assertEqual(prepared.payload["reasoning_effort"], "high")
|
||||
self.assertEqual(prepared.payload["max_tokens"], 123)
|
||||
self.assertEqual(prepared.payload["tools"][0]["type"], "function")
|
||||
self.assertEqual(
|
||||
prepared.payload["tool_choice"],
|
||||
"auto",
|
||||
{"type": "function", "function": {"name": "lookup"}},
|
||||
)
|
||||
self.assertNotIn("parallel_tool_calls", prepared.payload)
|
||||
|
||||
def test_normalizes_unsupported_required_tool_choice_to_auto(self) -> None:
|
||||
def test_uses_config_model_only_when_request_model_is_missing(self) -> None:
|
||||
prepared = prepare_upstream_request(
|
||||
{"messages": [{"role": "user", "content": "hi"}]},
|
||||
ProxyConfig(upstream_model="deepseek-v4-flash"),
|
||||
self.store,
|
||||
)
|
||||
|
||||
self.assertEqual(prepared.original_model, "deepseek-v4-flash")
|
||||
self.assertEqual(prepared.upstream_model, "deepseek-v4-flash")
|
||||
self.assertEqual(prepared.payload["model"], "deepseek-v4-flash")
|
||||
|
||||
def test_streaming_requests_include_usage_for_runtime_stats(self) -> None:
|
||||
prepared = prepare_upstream_request(
|
||||
{
|
||||
"model": "deepseek-v4-pro",
|
||||
"stream": True,
|
||||
"stream_options": {"include_usage": False},
|
||||
"messages": [{"role": "user", "content": "hi"}],
|
||||
},
|
||||
ProxyConfig(),
|
||||
self.store,
|
||||
)
|
||||
|
||||
self.assertEqual(prepared.payload["stream_options"]["include_usage"], True)
|
||||
|
||||
def test_preserves_required_tool_choice(self) -> None:
|
||||
payload = {
|
||||
"model": "deepseek-v4-pro",
|
||||
"messages": [{"role": "user", "content": "call a tool"}],
|
||||
|
|
@ -97,7 +137,25 @@ class TransformTests(unittest.TestCase):
|
|||
|
||||
prepared = prepare_upstream_request(payload, ProxyConfig(), self.store)
|
||||
|
||||
self.assertEqual(prepared.payload["tool_choice"], "auto")
|
||||
self.assertEqual(prepared.payload["tool_choice"], "required")
|
||||
|
||||
def test_preserves_named_tool_choice(self) -> None:
|
||||
payload = {
|
||||
"model": "deepseek-v4-pro",
|
||||
"messages": [{"role": "user", "content": "call lookup"}],
|
||||
"tools": [{"type": "function", "function": {"name": "lookup"}}],
|
||||
"tool_choice": {
|
||||
"type": "function",
|
||||
"function": {"name": "lookup"},
|
||||
},
|
||||
}
|
||||
|
||||
prepared = prepare_upstream_request(payload, ProxyConfig(), self.store)
|
||||
|
||||
self.assertEqual(
|
||||
prepared.payload["tool_choice"],
|
||||
{"type": "function", "function": {"name": "lookup"}},
|
||||
)
|
||||
|
||||
def test_restores_reasoning_content_for_cached_tool_call(self) -> None:
|
||||
prior_messages = [{"role": "user", "content": "read README"}]
|
||||
|
|
@ -117,7 +175,7 @@ class TransformTests(unittest.TestCase):
|
|||
],
|
||||
}
|
||||
self.store.store_assistant_message(
|
||||
assistant_message, conversation_scope(prior_messages)
|
||||
assistant_message, cache_scope(prior_messages)
|
||||
)
|
||||
|
||||
payload = {
|
||||
|
|
@ -151,6 +209,81 @@ class TransformTests(unittest.TestCase):
|
|||
"Need the file contents before answering.",
|
||||
)
|
||||
|
||||
def test_accepts_empty_reasoning_content_when_present_for_tool_call(
|
||||
self,
|
||||
) -> None:
|
||||
payload = {
|
||||
"model": "deepseek-v4-pro",
|
||||
"messages": [
|
||||
{"role": "user", "content": "read README"},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": "",
|
||||
"reasoning_content": "",
|
||||
"tool_calls": [
|
||||
{
|
||||
"id": "call_empty",
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "read_file",
|
||||
"arguments": '{"path":"README.md"}',
|
||||
},
|
||||
}
|
||||
],
|
||||
},
|
||||
{"role": "tool", "tool_call_id": "call_empty", "content": "file text"},
|
||||
],
|
||||
}
|
||||
|
||||
prepared = prepare_upstream_request(payload, ProxyConfig(), self.store)
|
||||
|
||||
self.assertEqual(prepared.patched_reasoning_messages, 0)
|
||||
self.assertEqual(prepared.missing_reasoning_messages, 0)
|
||||
self.assertIn("reasoning_content", prepared.payload["messages"][1])
|
||||
self.assertEqual(prepared.payload["messages"][1]["reasoning_content"], "")
|
||||
|
||||
def test_restores_empty_reasoning_content_from_cache(self) -> None:
|
||||
prior_messages = [{"role": "user", "content": "read README"}]
|
||||
tool_call = {
|
||||
"id": "call_empty",
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "read_file",
|
||||
"arguments": '{"path":"README.md"}',
|
||||
},
|
||||
}
|
||||
self.store.store_assistant_message(
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": "",
|
||||
"reasoning_content": "",
|
||||
"tool_calls": [tool_call],
|
||||
},
|
||||
cache_scope(prior_messages),
|
||||
)
|
||||
|
||||
prepared = prepare_upstream_request(
|
||||
{
|
||||
"model": "deepseek-v4-pro",
|
||||
"messages": [
|
||||
*prior_messages,
|
||||
{"role": "assistant", "content": "", "tool_calls": [tool_call]},
|
||||
{
|
||||
"role": "tool",
|
||||
"tool_call_id": "call_empty",
|
||||
"content": "file text",
|
||||
},
|
||||
],
|
||||
},
|
||||
ProxyConfig(),
|
||||
self.store,
|
||||
)
|
||||
|
||||
self.assertEqual(prepared.patched_reasoning_messages, 1)
|
||||
self.assertEqual(prepared.missing_reasoning_messages, 0)
|
||||
self.assertIn("reasoning_content", prepared.payload["messages"][1])
|
||||
self.assertEqual(prepared.payload["messages"][1]["reasoning_content"], "")
|
||||
|
||||
def test_restores_reasoning_content_for_cached_final_tool_turn_message(
|
||||
self,
|
||||
) -> None:
|
||||
|
|
@ -179,7 +312,7 @@ class TransformTests(unittest.TestCase):
|
|||
"reasoning_content": "The tool result is enough to answer.",
|
||||
}
|
||||
self.store.store_assistant_message(
|
||||
assistant_message, conversation_scope(prior_messages)
|
||||
assistant_message, cache_scope(prior_messages)
|
||||
)
|
||||
|
||||
payload = {
|
||||
|
|
@ -235,8 +368,8 @@ class TransformTests(unittest.TestCase):
|
|||
prior_a = [{"role": "user", "content": "thread A"}]
|
||||
prior_b = [{"role": "user", "content": "thread B"}]
|
||||
|
||||
self.store.store_assistant_message(assistant_a, conversation_scope(prior_a))
|
||||
self.store.store_assistant_message(assistant_b, conversation_scope(prior_b))
|
||||
self.store.store_assistant_message(assistant_a, cache_scope(prior_a))
|
||||
self.store.store_assistant_message(assistant_b, cache_scope(prior_b))
|
||||
|
||||
payload_a = {
|
||||
"model": "deepseek-v4-pro",
|
||||
|
|
@ -267,7 +400,7 @@ class TransformTests(unittest.TestCase):
|
|||
|
||||
def test_exact_message_signature_wins_over_tool_call_id_fallback(self) -> None:
|
||||
prior = [{"role": "user", "content": "same conversation prefix"}]
|
||||
scope = conversation_scope(prior)
|
||||
scope = cache_scope(prior)
|
||||
first_tool_call = {
|
||||
"id": "call_reused",
|
||||
"type": "function",
|
||||
|
|
@ -336,7 +469,7 @@ class TransformTests(unittest.TestCase):
|
|||
}
|
||||
],
|
||||
}
|
||||
self.store.store_assistant_message(assistant_message, conversation_scope(prior))
|
||||
self.store.store_assistant_message(assistant_message, cache_scope(prior))
|
||||
|
||||
payload = {
|
||||
"model": "deepseek-v4-pro",
|
||||
|
|
@ -386,7 +519,7 @@ class TransformTests(unittest.TestCase):
|
|||
"reasoning_content": "Need to call the file tool.",
|
||||
"tool_calls": [tool_call],
|
||||
},
|
||||
conversation_scope(prior),
|
||||
cache_scope(prior),
|
||||
)
|
||||
|
||||
prepared = prepare_upstream_request(
|
||||
|
|
@ -412,7 +545,7 @@ class TransformTests(unittest.TestCase):
|
|||
"Need to call the file tool.",
|
||||
)
|
||||
|
||||
def test_adds_fallback_reasoning_for_uncached_assistant_tool_call(self) -> None:
|
||||
def test_reports_missing_reasoning_for_uncached_assistant_tool_call(self) -> None:
|
||||
payload = {
|
||||
"model": "deepseek-v4-pro",
|
||||
"messages": [
|
||||
|
|
@ -442,10 +575,51 @@ class TransformTests(unittest.TestCase):
|
|||
prepared = prepare_upstream_request(payload, ProxyConfig(), self.store)
|
||||
|
||||
self.assertEqual(prepared.patched_reasoning_messages, 0)
|
||||
self.assertEqual(prepared.fallback_reasoning_messages, 1)
|
||||
self.assertIn("reasoning_content", prepared.payload["messages"][1])
|
||||
self.assertEqual(prepared.missing_reasoning_messages, 1)
|
||||
self.assertNotIn("reasoning_content", prepared.payload["messages"][1])
|
||||
|
||||
def test_adds_fallback_reasoning_for_uncached_assistant_after_tool_result(
|
||||
def test_can_insert_placeholder_for_uncached_assistant_tool_call(self) -> None:
|
||||
payload = {
|
||||
"model": "deepseek-v4-pro",
|
||||
"messages": [
|
||||
{"role": "user", "content": "read README"},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": "",
|
||||
"tool_calls": [
|
||||
{
|
||||
"id": "call_uncached",
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "read_file",
|
||||
"arguments": '{"path":"README.md"}',
|
||||
},
|
||||
}
|
||||
],
|
||||
},
|
||||
{
|
||||
"role": "tool",
|
||||
"tool_call_id": "call_uncached",
|
||||
"content": "file text",
|
||||
},
|
||||
],
|
||||
}
|
||||
|
||||
prepared = prepare_upstream_request(
|
||||
payload,
|
||||
ProxyConfig(missing_reasoning_strategy="placeholder"),
|
||||
self.store,
|
||||
)
|
||||
|
||||
self.assertEqual(prepared.patched_reasoning_messages, 0)
|
||||
self.assertEqual(prepared.placeholder_reasoning_messages, 1)
|
||||
self.assertEqual(prepared.missing_reasoning_messages, 0)
|
||||
self.assertEqual(
|
||||
prepared.payload["messages"][1]["reasoning_content"],
|
||||
PLACEHOLDER_REASONING_CONTENT,
|
||||
)
|
||||
|
||||
def test_reports_missing_reasoning_for_uncached_assistant_after_tool_result(
|
||||
self,
|
||||
) -> None:
|
||||
payload = {
|
||||
|
|
@ -479,10 +653,10 @@ class TransformTests(unittest.TestCase):
|
|||
|
||||
prepared = prepare_upstream_request(payload, ProxyConfig(), self.store)
|
||||
|
||||
self.assertEqual(prepared.fallback_reasoning_messages, 1)
|
||||
self.assertIn("reasoning_content", prepared.payload["messages"][3])
|
||||
self.assertEqual(prepared.missing_reasoning_messages, 1)
|
||||
self.assertNotIn("reasoning_content", prepared.payload["messages"][3])
|
||||
|
||||
def test_does_not_add_fallback_reasoning_for_plain_chat_history(self) -> None:
|
||||
def test_does_not_report_missing_reasoning_for_plain_chat_history(self) -> None:
|
||||
payload = {
|
||||
"model": "deepseek-v4-pro",
|
||||
"messages": [
|
||||
|
|
@ -494,7 +668,86 @@ class TransformTests(unittest.TestCase):
|
|||
|
||||
prepared = prepare_upstream_request(payload, ProxyConfig(), self.store)
|
||||
|
||||
self.assertEqual(prepared.fallback_reasoning_messages, 0)
|
||||
self.assertEqual(prepared.missing_reasoning_messages, 0)
|
||||
self.assertNotIn("reasoning_content", prepared.payload["messages"][1])
|
||||
|
||||
def test_does_not_repair_reasoning_when_thinking_is_disabled(self) -> None:
|
||||
payload = {
|
||||
"model": "deepseek-v4-pro",
|
||||
"messages": [
|
||||
{"role": "user", "content": "read README"},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": "",
|
||||
"reasoning_content": "Should be removed in non-thinking mode.",
|
||||
"tool_calls": [
|
||||
{
|
||||
"id": "call_uncached",
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "read_file",
|
||||
"arguments": '{"path":"README.md"}',
|
||||
},
|
||||
}
|
||||
],
|
||||
},
|
||||
{
|
||||
"role": "tool",
|
||||
"tool_call_id": "call_uncached",
|
||||
"content": "file text",
|
||||
},
|
||||
],
|
||||
}
|
||||
|
||||
prepared = prepare_upstream_request(
|
||||
payload, ProxyConfig(thinking="disabled"), self.store
|
||||
)
|
||||
|
||||
self.assertEqual(prepared.missing_reasoning_messages, 0)
|
||||
self.assertNotIn("reasoning_content", prepared.payload["messages"][1])
|
||||
|
||||
def test_reasoning_cache_is_namespaced_by_authorization(self) -> None:
|
||||
config = ProxyConfig()
|
||||
prior = [{"role": "user", "content": "read README"}]
|
||||
namespace_a = reasoning_cache_namespace(
|
||||
config,
|
||||
config.upstream_model,
|
||||
{"type": "enabled"},
|
||||
"high",
|
||||
"Bearer key-a",
|
||||
)
|
||||
tool_call = {
|
||||
"id": "call_123",
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "read_file",
|
||||
"arguments": '{"path":"README.md"}',
|
||||
},
|
||||
}
|
||||
self.store.store_assistant_message(
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": "",
|
||||
"reasoning_content": "Reasoning for key A.",
|
||||
"tool_calls": [tool_call],
|
||||
},
|
||||
conversation_scope(prior, namespace_a),
|
||||
)
|
||||
|
||||
prepared = prepare_upstream_request(
|
||||
{
|
||||
"model": "deepseek-v4-pro",
|
||||
"messages": [
|
||||
*prior,
|
||||
{"role": "assistant", "content": "", "tool_calls": [tool_call]},
|
||||
],
|
||||
},
|
||||
config,
|
||||
self.store,
|
||||
authorization="Bearer key-b",
|
||||
)
|
||||
|
||||
self.assertEqual(prepared.missing_reasoning_messages, 1)
|
||||
self.assertNotIn("reasoning_content", prepared.payload["messages"][1])
|
||||
|
||||
def test_converted_function_message_uses_tool_schema(self) -> None:
|
||||
|
|
@ -561,6 +814,35 @@ class TransformTests(unittest.TestCase):
|
|||
"I need to inspect the repo.",
|
||||
)
|
||||
|
||||
def test_rewrite_response_preserves_prompt_cache_usage_fields(self) -> None:
|
||||
body = json.dumps(
|
||||
{
|
||||
"id": "chatcmpl-test",
|
||||
"object": "chat.completion",
|
||||
"model": "deepseek-v4-pro",
|
||||
"choices": [
|
||||
{
|
||||
"index": 0,
|
||||
"finish_reason": "stop",
|
||||
"message": {"role": "assistant", "content": "ok"},
|
||||
}
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 10,
|
||||
"prompt_cache_hit_tokens": 6,
|
||||
"prompt_cache_miss_tokens": 4,
|
||||
"completion_tokens": 1,
|
||||
"total_tokens": 11,
|
||||
},
|
||||
}
|
||||
).encode()
|
||||
|
||||
rewritten = rewrite_response_body(body, "deepseek-v4-flash", self.store, [])
|
||||
payload = json.loads(rewritten)
|
||||
|
||||
self.assertEqual(payload["usage"]["prompt_cache_hit_tokens"], 6)
|
||||
self.assertEqual(payload["usage"]["prompt_cache_miss_tokens"], 4)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
|
|
|
|||
Loading…
Reference in New Issue