Make the repository's primary source tree genuinely Python

The old tracked TypeScript snapshot has been removed from the repository history and the root  directory is now a Python porting workspace. README and tests now describe and verify the Python-first layout instead of treating the exposed snapshot as the active source tree.

A local archive can still exist outside Git, but the tracked repository now presents only the Python porting surface, related essay context, and OmX workflow artifacts.

Constraint: Tracked history should collapse to a single commit while excluding the archived snapshot from Git
Rejected: Keep the exposed TypeScript tree in tracked history under an archive path | user explicitly wanted only the Python porting repo state in Git
Confidence: medium
Scope-risk: broad
Reversibility: messy
Directive: Keep future tracked additions focused on the Python port itself; do not reintroduce the exposed snapshot into Git history
Tested: python3 -m unittest discover -s tests -v; python3 -m src.main summary; git diff --check
Not-tested: Behavioral parity with the original TypeScript system beyond the current Python workspace surface
This commit is contained in:
instructkr
2026-03-31 05:38:29 -07:00
commit 7c3c5f7eb9
14 changed files with 435 additions and 0 deletions

2
.gitignore vendored Normal file
View File

@@ -0,0 +1,2 @@
__pycache__/
archive/

View File

@@ -0,0 +1,88 @@
# Is legal the same as legitimate: AI reimplementation and the erosion of copyleft
- **Date:** March 9, 2026
- **Author:** Hong Minhee
- **Source context:** _Hong Minhee on Things_ (English / 日本語 / 朝鮮語 (國漢文) / 한국어 (한글))
- **Archive note:** This copy was normalized from user-provided text for this repository's research/archive context. Site navigation/footer language links were converted into metadata.
Last week, Dan Blanchard, the maintainer of chardet—a Python library for detecting text encodings used by roughly 130 million projects a month—released a new version. Version 7.0 is 48 times faster than its predecessor, supports multiple cores, and was redesigned from the ground up. Anthropic's Claude is listed as a contributor. The license changed from LGPL to MIT.
Blanchard's account is that he never looked at the existing source code directly. He fed only the API and the test suite to Claude and asked it to reimplement the library from scratch. The resulting code shares less than 1.3% similarity with any prior version, as measured by JPlag. His conclusion: this is an independent new work, and he is under no obligation to carry forward the LGPL. Mark Pilgrim, the library's original author, opened a GitHub issue to object. The LGPL requires that modifications be distributed under the same license, and a reimplementation produced with ample exposure to the original codebase cannot, in Pilgrim's view, pass as a clean-room effort.
The dispute drew responses from two prominent figures in the open source world. Armin Ronacher, the creator of Flask, welcomed the relicensing. Salvatore Sanfilippo (antirez), the creator of Redis, published a broader defense of AI reimplementation, grounding it in copyright law and the history of the GNU project. Both conclude, by different routes, that what Blanchard did is legitimate. I respect both writers, and I think both are wrong—or more precisely, both are evading the question that actually matters.
That question is this: does legal mean legitimate? Neither piece answers it. Both move from “this is legally permissible” to “this is therefore fine,” without pausing at the gap between those two claims. Law sets a floor; clearing it does not mean the conduct is right. That gap is where this essay begins.
## The analogy points the wrong way
Antirez builds his case on history. When the GNU project reimplemented the UNIX userspace, it was lawful. So was Linux. Copyright law prohibits copying “protected expressions”—the actual code, its structure, its specific mechanisms—but it does not protect ideas or behavior. AI-assisted reimplementation occupies the same legal ground. Therefore, it is lawful.
The legal analysis is largely correct, and I am not disputing it. The problem lies in what antirez does next: he presents the legal conclusion as if it were also a social one, and uses a historical analogy that, examined more carefully, argues against his own position.
When GNU reimplemented the UNIX userspace, the vector ran from proprietary to free. Stallman was using the limits of copyright law to turn proprietary software into free software. The ethical force of that project did not come from its legal permissibility—it came from the direction it was moving, from the fact that it was expanding the commons. That is why people cheered.
The vector in the chardet case runs the other way. Software protected by a copyleft license—one that guarantees users the right to study, modify, and redistribute derivative works under the same terms—has been reimplemented under a permissive license that carries no such guarantee. This is not a reimplementation that expands the commons. It is one that removes the fencing that protected the commons. Derivative works built on chardet 7.0 are under no obligation to share their source code. That obligation, which applied to a library downloaded 130 million times a month, is now gone.
Antirez does not address this directional difference. He invokes the GNU precedent, but that precedent is a counterexample to his conclusion, not a supporting one.
## Does the GPL work against sharing?
Ronacher's argument is different. He discloses upfront that he has a stake in the outcome: “I personally have a horse in the race here because I too wanted chardet to be under a non-GPL license for many years. So consider me a very biased person in that regard.” He goes on to write that he considers “the GPL to run against that spirit by restricting what can be done with it”—the spirit being that society is better off when we share.
This claim rests on a fundamental misreading of what the GPL does.
Start with what the GPL actually prohibits. It does not prohibit keeping source code private. It imposes no constraint on privately modifying GPL software and using it yourself. The GPL's conditions are triggered only by distribution. If you distribute modified code, or offer it as a networked service, you must make the source available under the same terms. This is not a restriction on sharing. It is a condition placed on sharing: if you share, you must share in kind.
The requirement that improvements be returned to the commons is not a mechanism that suppresses sharing. It is a mechanism that makes sharing recursive and self-reinforcing. The claim that imposing contribution obligations on users of a commons undermines sharing culture does not hold together logically.
The contrast with the MIT license clarifies the point. Under MIT, anyone may take code, improve it, and close it off into a proprietary product. You can receive from the commons without giving back. If Ronacher calls this structure “more share-friendly,” he is using a concept of sharing with a specific directionality built in: sharing flows toward whoever has more capital and more engineers to take advantage of it.
The historical record bears this out. In the 1990s, companies routinely absorbed GPL code into proprietary products—not because they had chosen permissive licenses, but because copyleft enforcement was slack. The strengthening of copyleft mechanisms closed that gap. For individual developers and small projects without the resources to compete on anything but reciprocity, copyleft was what made the exchange approximately fair.
The creator of Flask knows this distinction. If he elides it anyway, the argument is not naïve—it is convenient.
## A self-refuting example
The most interesting moment in Ronacher's piece is not the argument but a detail he mentions in passing: Vercel reimplemented GNU Bash using AI and published it, then got visibly upset when Cloudflare reimplemented Next.js the same way.
Ronacher notes this as an irony and moves on. But the irony cuts deeper than he lets on. Next.js is MIT licensed. Cloudflare's vinext did not violate any license—it did exactly what Ronacher calls a contribution to the culture of openness, applied to a permissively licensed codebase. Vercel's reaction had nothing to do with license infringement; it was purely competitive and territorial. The implicit position is: reimplementing GPL software as MIT is a victory for sharing, but having our own MIT software reimplemented by a competitor is cause for outrage. This is what the claim that permissive licensing is “more share-friendly” than copyleft looks like in practice. The spirit of sharing, it turns out, runs in one direction only: outward from oneself.
Ronacher registers the contradiction and does not stop. “This development plays into my worldview,” he writes. When you present evidence that cuts against your own position, acknowledge it, and then proceed to your original conclusion unchanged, that is a signal that the conclusion preceded the argument.
## Legality and social legitimacy are different registers
Back to the question posed at the start. Is legal the same as legitimate?
Antirez closes his careful legal analysis as though it settles the matter. Ronacher acknowledges that “there is an obvious moral question here, but that isn't necessarily what I'm interested in.” Both pieces treat legal permissibility as a proxy for social legitimacy. But law only says what conduct it will not prevent—it does not certify that conduct as right. Aggressive tax minimization that never crosses into illegality may still be widely regarded as antisocial. A pharmaceutical company that legally acquires a patent on a long-generic drug and raises the price a hundredfold has done something legal, but that does not make it fine. Legality is a necessary condition; it is not a sufficient one.
In the chardet case, the distinction is sharper still. What the LGPL protected was not Blanchard's labor alone. It was a social compact agreed to by everyone who contributed to the library over twelve years. The terms of that compact were: if you take this and build on it, you share back under the same terms. This compact operated as a legal instrument, yes, but it was also the foundation of trust that made contribution rational. The fact that a reimplementation may qualify legally as a new work, and the fact that it breaks faith with the original contributors, are separate questions. If a court eventually rules in Blanchard's favor, that ruling will tell us what the law permits. It will not tell us that the act was right.
Zoë Kooyman, executive director of the FSF, put it plainly: “Refusing to grant others the rights you yourself received as a user is highly antisocial, no matter what method you use.”
## Whose perspective is the default?
Reading this debate, I keep returning to a question about position. From where are these two writers looking at the situation?
Antirez created Redis. Ronacher created Flask. Both are figures at the center of the open source ecosystem, with large audiences and well-established reputations. For them, falling costs of AI reimplementation means something specific: it is easier to reimplement things they want in a different form. Ronacher says explicitly that he had begun reimplementing GNU Readline precisely because of its copyleft terms.
For the people who have spent years contributing to a library like chardet, the same shift in costs means something else entirely: the copyleft protection around their contributions can be removed. The two writers are speaking from the former position to people in the latter, telling them that this was always lawful, that historical precedent supports it, and that the appropriate response is adaptation.
When positional asymmetry of this kind is ignored, and the argument is presented as universal analysis, what you get is not analysis but rationalization. Both writers arrive at conclusions that align precisely with their own interests. Readers should hold that fact in mind.
## What this fight points toward
Bruce Perens, who wrote the original Open Source Definition, told The Register: “The entire economics of software development are dead, gone, over, kaput!” He meant it as an alarm. Antirez, from a similar assessment of the situation, draws the conclusion: adapt. Ronacher says he finds the direction exciting.
None of the three responses addresses the central question. When copyleft becomes technically easier to circumvent, does that make it less necessary, or more?
I think more. What the GPL protected was not the scarcity of code but the freedom of users. The fact that producing code has become cheaper does not make it acceptable to use that code as a vehicle for eroding freedom. If anything, as the friction of reimplementation disappears, so does the friction of stripping copyleft from anything left exposed. The erosion of enforcement capacity is a legal problem. It does not touch the underlying normative judgment.
That judgment is this: those who take from the commons owe something back to the commons. The principle does not change depending on whether a reimplementation takes five years or five days. No court ruling on AI-generated code will alter its social weight.
This is where law and community norms diverge. Law is made slowly, after the fact, reflecting existing power arrangements. The norms that open source communities built over decades did not wait for court approval. People chose the GPL when the law offered them no guarantee of its enforcement, because it expressed the values of the communities they wanted to belong to. Those values do not expire when the law changes.
In previous writing, I argued for a training copyleft (TGPL) as the next step in this line of development. The chardet situation suggests the argument has to go further: to a specification copyleft covering the layer below source code. If source code can now be generated from a specification, the specification is where the essential intellectual content of a GPL project resides. Blanchard's own claim—that he worked only from the test suite and API without reading the source—is, paradoxically, an argument for protecting that test suite and API specification under copyleft terms.
The history of the GPL is the history of licensing tools evolving in response to new forms of exploitation: GPLv2 to GPLv3, then AGPL. What drove each evolution was not a court ruling but a community reaching a value judgment first and then seeking legal instruments to express it. The same sequence is available now. Whatever courts eventually decide about AI reimplementation, the question we need to answer first is not a legal one. It is a social one. Do those who take from the commons owe something back? I think they do. That judgment does not require a verdict.
What makes the pieces by antirez and Ronacher worth reading is not that they are right. It is that they make visible, with unusual clarity, what they are choosing not to see. When legality is used as a substitute for a value judgment, the question that actually matters gets buried in the footnotes of a law it has already outgrown.

106
README.md Normal file
View File

@@ -0,0 +1,106 @@
# Claude Code Python Porting Workspace
> The primary `src/` tree in this repository is now dedicated to **Python porting work**. The March 31, 2026 Claude Code source exposure is part of the project's background, but the tracked repository is now centered on Python source rather than the exposed TypeScript snapshot.
---
## Porting Status
The main source tree is now Python-first.
- `src/` contains the active Python porting workspace
- `tests/` verifies the current Python workspace
- the exposed snapshot is no longer part of the tracked repository state
The current Python workspace is not yet a complete one-to-one replacement for the original system, but the primary implementation surface is now Python.
## Why this rewrite exists
I originally studied the exposed codebase to understand its harness, tool wiring, and agent workflow. After spending more time with the legal and ethical questions—and after reading the essay linked below—I did not want the exposed snapshot itself to remain the main tracked source tree.
This repository now focuses on Python porting work instead.
## Repository Layout
```text
.
├── src/ # Python porting workspace
│ ├── __init__.py
│ ├── commands.py
│ ├── main.py
│ ├── models.py
│ ├── port_manifest.py
│ ├── query_engine.py
│ ├── task.py
│ └── tools.py
├── tests/ # Python verification
├── assets/omx/ # OmX workflow screenshots
├── 2026-03-09-is-legal-the-same-as-legitimate-ai-reimplementation-and-the-erosion-of-copyleft.md
└── README.md
```
## Python Workspace Overview
The new Python `src/` tree currently provides:
- **`port_manifest.py`** — summarizes the current Python workspace structure
- **`models.py`** — dataclasses for subsystems, modules, and backlog state
- **`commands.py`** — Python-side command port metadata
- **`tools.py`** — Python-side tool port metadata
- **`query_engine.py`** — renders a Python porting summary from the active workspace
- **`main.py`** — a CLI entrypoint for manifest and summary output
## Quickstart
Render the Python porting summary:
```bash
python3 -m src.main summary
```
Print the current Python workspace manifest:
```bash
python3 -m src.main manifest
```
List the current Python modules:
```bash
python3 -m src.main subsystems --limit 16
```
Run verification:
```bash
python3 -m unittest discover -s tests -v
```
## Related Essay
- [*Is legal the same as legitimate: AI reimplementation and the erosion of copyleft*](https://writings.hongminhee.org/2026/03/legal-vs-legitimate/)
The essay is dated **March 9, 2026**, so it should be read as companion analysis that predates the **March 31, 2026** source exposure that motivated this rewrite direction.
## Built with `oh-my-codex`
The restructuring and documentation work on this repository was AI-assisted and orchestrated with Yeachan Heo's [oh-my-codex (OmX)](https://github.com/Yeachan-Heo/oh-my-codex), layered on top of Codex.
- **`$team` mode:** used for coordinated parallel review and architectural feedback
- **`$ralph` mode:** used for persistent execution, verification, and completion discipline
- **Codex-driven workflow:** used to turn the main `src/` tree into a Python-first porting workspace
### OmX workflow screenshots
![OmX workflow screenshot 1](assets/omx/omx-readme-review-1.png)
*Ralph/team orchestration view while the README and essay context were being reviewed in terminal panes.*
![OmX workflow screenshot 2](assets/omx/omx-readme-review-2.png)
*Split-pane review and verification flow during the final README wording pass.*
## Ownership / Affiliation Disclaimer
- This repository does **not** claim ownership of the original Claude Code source material.
- This repository is **not affiliated with, endorsed by, or maintained by Anthropic**.

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 MiB

16
src/__init__.py Normal file
View File

@@ -0,0 +1,16 @@
"""Python porting workspace for the Claude Code rewrite effort."""
from .commands import PORTED_COMMANDS, build_command_backlog
from .port_manifest import PortManifest, build_port_manifest
from .query_engine import QueryEnginePort
from .tools import PORTED_TOOLS, build_tool_backlog
__all__ = [
'PortManifest',
'QueryEnginePort',
'PORTED_COMMANDS',
'PORTED_TOOLS',
'build_command_backlog',
'build_port_manifest',
'build_tool_backlog',
]

13
src/commands.py Normal file
View File

@@ -0,0 +1,13 @@
from __future__ import annotations
from .models import PortingBacklog, PortingModule
PORTED_COMMANDS = (
PortingModule('main', 'Expose a Python CLI for manifest and backlog reporting', 'src/main.py', 'implemented'),
PortingModule('summary', 'Render a Markdown overview of the current porting workspace', 'src/query_engine.py', 'implemented'),
PortingModule('subsystems', 'List the current Python modules participating in the rewrite', 'src/port_manifest.py', 'implemented'),
)
def build_command_backlog() -> PortingBacklog:
return PortingBacklog(title='Command surface', modules=list(PORTED_COMMANDS))

38
src/main.py Normal file
View File

@@ -0,0 +1,38 @@
from __future__ import annotations
import argparse
from .port_manifest import build_port_manifest
from .query_engine import QueryEnginePort
def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(description='Python porting workspace for the Claude Code rewrite effort')
subparsers = parser.add_subparsers(dest='command', required=True)
subparsers.add_parser('summary', help='render a Markdown summary of the Python porting workspace')
subparsers.add_parser('manifest', help='print the current Python workspace manifest')
list_parser = subparsers.add_parser('subsystems', help='list the current Python modules in the workspace')
list_parser.add_argument('--limit', type=int, default=16)
return parser
def main(argv: list[str] | None = None) -> int:
parser = build_parser()
args = parser.parse_args(argv)
manifest = build_port_manifest()
if args.command == 'summary':
print(QueryEnginePort(manifest).render_summary())
return 0
if args.command == 'manifest':
print(manifest.to_markdown())
return 0
if args.command == 'subsystems':
for subsystem in manifest.top_level_modules[: args.limit]:
print(f'{subsystem.name} {subsystem.file_count} {subsystem.notes}')
return 0
parser.error(f'unknown command: {args.command}')
return 2
if __name__ == '__main__':
raise SystemExit(main())

31
src/models.py Normal file
View File

@@ -0,0 +1,31 @@
from __future__ import annotations
from dataclasses import dataclass, field
@dataclass(frozen=True)
class Subsystem:
name: str
path: str
file_count: int
notes: str
@dataclass(frozen=True)
class PortingModule:
name: str
responsibility: str
source_hint: str
status: str = 'planned'
@dataclass
class PortingBacklog:
title: str
modules: list[PortingModule] = field(default_factory=list)
def summary_lines(self) -> list[str]:
return [
f'- {module.name} [{module.status}] — {module.responsibility} (from {module.source_hint})'
for module in self.modules
]

52
src/port_manifest.py Normal file
View File

@@ -0,0 +1,52 @@
from __future__ import annotations
from collections import Counter
from dataclasses import dataclass
from pathlib import Path
from .models import Subsystem
DEFAULT_SRC_ROOT = Path(__file__).resolve().parent
@dataclass(frozen=True)
class PortManifest:
src_root: Path
total_python_files: int
top_level_modules: tuple[Subsystem, ...]
def to_markdown(self) -> str:
lines = [
f'Port root: `{self.src_root}`',
f'Total Python files: **{self.total_python_files}**',
'',
'Top-level Python modules:',
]
for module in self.top_level_modules:
lines.append(f'- `{module.name}` ({module.file_count} files) — {module.notes}')
return '\n'.join(lines)
def build_port_manifest(src_root: Path | None = None) -> PortManifest:
root = src_root or DEFAULT_SRC_ROOT
files = [path for path in root.rglob('*.py') if path.is_file()]
counter = Counter(
path.relative_to(root).parts[0] if len(path.relative_to(root).parts) > 1 else path.name
for path in files
if path.name != '__pycache__'
)
notes = {
'__init__.py': 'package export surface',
'main.py': 'CLI entrypoint',
'port_manifest.py': 'workspace manifest generation',
'query_engine.py': 'port orchestration summary layer',
'commands.py': 'command backlog metadata',
'tools.py': 'tool backlog metadata',
'models.py': 'shared dataclasses',
'task.py': 'task-level planning structures',
}
modules = tuple(
Subsystem(name=name, path=f'src/{name}', file_count=count, notes=notes.get(name, 'Python port support module'))
for name, count in counter.most_common()
)
return PortManifest(src_root=root, total_python_files=len(files), top_level_modules=modules)

32
src/query_engine.py Normal file
View File

@@ -0,0 +1,32 @@
from __future__ import annotations
from dataclasses import dataclass
from .commands import build_command_backlog
from .port_manifest import PortManifest, build_port_manifest
from .tools import build_tool_backlog
@dataclass
class QueryEnginePort:
manifest: PortManifest
@classmethod
def from_workspace(cls) -> 'QueryEnginePort':
return cls(manifest=build_port_manifest())
def render_summary(self) -> str:
command_backlog = build_command_backlog()
tool_backlog = build_tool_backlog()
sections = [
'# Python Porting Workspace Summary',
'',
self.manifest.to_markdown(),
'',
f'{command_backlog.title}:',
*command_backlog.summary_lines(),
'',
f'{tool_backlog.title}:',
*tool_backlog.summary_lines(),
]
return '\n'.join(sections)

10
src/task.py Normal file
View File

@@ -0,0 +1,10 @@
from __future__ import annotations
from dataclasses import dataclass
@dataclass(frozen=True)
class PortingTask:
title: str
detail: str
completed: bool = False

13
src/tools.py Normal file
View File

@@ -0,0 +1,13 @@
from __future__ import annotations
from .models import PortingBacklog, PortingModule
PORTED_TOOLS = (
PortingModule('port_manifest', 'Inspect the active Python source tree and summarize the current rewrite surface', 'src/port_manifest.py', 'implemented'),
PortingModule('backlog_models', 'Represent subsystem and backlog metadata as Python dataclasses', 'src/models.py', 'implemented'),
PortingModule('query_engine', 'Coordinate Python-facing rewrite summaries and reporting', 'src/query_engine.py', 'implemented'),
)
def build_tool_backlog() -> PortingBacklog:
return PortingBacklog(title='Tool surface', modules=list(PORTED_TOOLS))

View File

@@ -0,0 +1,34 @@
from __future__ import annotations
import subprocess
import sys
import unittest
from src.port_manifest import build_port_manifest
from src.query_engine import QueryEnginePort
class PortingWorkspaceTests(unittest.TestCase):
def test_manifest_counts_python_files(self) -> None:
manifest = build_port_manifest()
self.assertGreaterEqual(manifest.total_python_files, 7)
self.assertTrue(manifest.top_level_modules)
def test_query_engine_summary_mentions_workspace(self) -> None:
summary = QueryEnginePort.from_workspace().render_summary()
self.assertIn('Python Porting Workspace Summary', summary)
self.assertIn('Command surface', summary)
self.assertIn('Tool surface', summary)
def test_cli_summary_runs(self) -> None:
result = subprocess.run(
[sys.executable, '-m', 'src.main', 'summary'],
check=True,
capture_output=True,
text=True,
)
self.assertIn('Python Porting Workspace Summary', result.stdout)
if __name__ == '__main__':
unittest.main()