Make the repository's primary source tree genuinely Python

The old tracked TypeScript snapshot has been removed from the repository history and the root directory is now a Python porting workspace. README and tests now describe and verify the Python-first layout instead of treating the exposed snapshot as the active source tree. A local archive can still exist outside Git, but the tracked repository now presents only the Python porting surface, related essay context, and OmX workflow artifacts. Constraint: Tracked history should collapse to a single commit while excluding the archived snapshot from Git Rejected: Keep the exposed TypeScript tree in tracked history under an archive path | user explicitly wanted only the Python porting repo state in Git Confidence: medium Scope-risk: broad Reversibility: messy Directive: Keep future tracked additions focused on the Python port itself; do not reintroduce the exposed snapshot into Git history Tested: python3 -m unittest discover -s tests -v; python3 -m src.main summary; git diff --check Not-tested: Behavioral parity with the original TypeScript system beyond the current Python workspace surface
2026-03-31 22:02:32 +08:00 · 2026-03-31 05:38:29 -07:00
commit 7c3c5f7eb9
14 changed files with 435 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,2 @@
 __pycache__/
 archive/
--- a/2026-03-09-is-legal-the-same-as-legitimate-ai-reimplementation-and-the-erosion-of-copyleft.md
+++ b/2026-03-09-is-legal-the-same-as-legitimate-ai-reimplementation-and-the-erosion-of-copyleft.md
@@ -0,0 +1,88 @@
 # Is legal the same as legitimate: AI reimplementation and the erosion of copyleft
 - **Date:** March 9, 2026
 - **Author:** Hong Minhee
 - **Source context:** _Hong Minhee on Things_ (English / 日本語 / 朝鮮語 (國漢文) / 한국어 (한글))
 - **Archive note:** This copy was normalized from user-provided text for this repository's research/archive context. Site navigation/footer language links were converted into metadata.
 Last week, Dan Blanchard, the maintainer of chardet—a Python library for detecting text encodings used by roughly 130 million projects a month—released a new version. Version 7.0 is 48 times faster than its predecessor, supports multiple cores, and was redesigned from the ground up. Anthropic's Claude is listed as a contributor. The license changed from LGPL to MIT.
 Blanchard's account is that he never looked at the existing source code directly. He fed only the API and the test suite to Claude and asked it to reimplement the library from scratch. The resulting code shares less than 1.3% similarity with any prior version, as measured by JPlag. His conclusion: this is an independent new work, and he is under no obligation to carry forward the LGPL. Mark Pilgrim, the library's original author, opened a GitHub issue to object. The LGPL requires that modifications be distributed under the same license, and a reimplementation produced with ample exposure to the original codebase cannot, in Pilgrim's view, pass as a clean-room effort.
 The dispute drew responses from two prominent figures in the open source world. Armin Ronacher, the creator of Flask, welcomed the relicensing. Salvatore Sanfilippo (antirez), the creator of Redis, published a broader defense of AI reimplementation, grounding it in copyright law and the history of the GNU project. Both conclude, by different routes, that what Blanchard did is legitimate. I respect both writers, and I think both are wrong—or more precisely, both are evading the question that actually matters.
 That question is this: does legal mean legitimate? Neither piece answers it. Both move from “this is legally permissible” to “this is therefore fine,” without pausing at the gap between those two claims. Law sets a floor; clearing it does not mean the conduct is right. That gap is where this essay begins.
 ## The analogy points the wrong way
 Antirez builds his case on history. When the GNU project reimplemented the UNIX userspace, it was lawful. So was Linux. Copyright law prohibits copying “protected expressions”—the actual code, its structure, its specific mechanisms—but it does not protect ideas or behavior. AI-assisted reimplementation occupies the same legal ground. Therefore, it is lawful.
 The legal analysis is largely correct, and I am not disputing it. The problem lies in what antirez does next: he presents the legal conclusion as if it were also a social one, and uses a historical analogy that, examined more carefully, argues against his own position.
 When GNU reimplemented the UNIX userspace, the vector ran from proprietary to free. Stallman was using the limits of copyright law to turn proprietary software into free software. The ethical force of that project did not come from its legal permissibility—it came from the direction it was moving, from the fact that it was expanding the commons. That is why people cheered.
 The vector in the chardet case runs the other way. Software protected by a copyleft license—one that guarantees users the right to study, modify, and redistribute derivative works under the same terms—has been reimplemented under a permissive license that carries no such guarantee. This is not a reimplementation that expands the commons. It is one that removes the fencing that protected the commons. Derivative works built on chardet 7.0 are under no obligation to share their source code. That obligation, which applied to a library downloaded 130 million times a month, is now gone.
 Antirez does not address this directional difference. He invokes the GNU precedent, but that precedent is a counterexample to his conclusion, not a supporting one.
 ## Does the GPL work against sharing?
 Ronacher's argument is different. He discloses upfront that he has a stake in the outcome: “I personally have a horse in the race here because I too wanted chardet to be under a non-GPL license for many years. So consider me a very biased person in that regard.” He goes on to write that he considers “the GPL to run against that spirit by restricting what can be done with it”—the spirit being that society is better off when we share.
 This claim rests on a fundamental misreading of what the GPL does.
 Start with what the GPL actually prohibits. It does not prohibit keeping source code private. It imposes no constraint on privately modifying GPL software and using it yourself. The GPL's conditions are triggered only by distribution. If you distribute modified code, or offer it as a networked service, you must make the source available under the same terms. This is not a restriction on sharing. It is a condition placed on sharing: if you share, you must share in kind.
 The requirement that improvements be returned to the commons is not a mechanism that suppresses sharing. It is a mechanism that makes sharing recursive and self-reinforcing. The claim that imposing contribution obligations on users of a commons undermines sharing culture does not hold together logically.
 The contrast with the MIT license clarifies the point. Under MIT, anyone may take code, improve it, and close it off into a proprietary product. You can receive from the commons without giving back. If Ronacher calls this structure “more share-friendly,” he is using a concept of sharing with a specific directionality built in: sharing flows toward whoever has more capital and more engineers to take advantage of it.
 The historical record bears this out. In the 1990s, companies routinely absorbed GPL code into proprietary products—not because they had chosen permissive licenses, but because copyleft enforcement was slack. The strengthening of copyleft mechanisms closed that gap. For individual developers and small projects without the resources to compete on anything but reciprocity, copyleft was what made the exchange approximately fair.
 The creator of Flask knows this distinction. If he elides it anyway, the argument is not naïve—it is convenient.
 ## A self-refuting example
 The most interesting moment in Ronacher's piece is not the argument but a detail he mentions in passing: Vercel reimplemented GNU Bash using AI and published it, then got visibly upset when Cloudflare reimplemented Next.js the same way.
 Ronacher notes this as an irony and moves on. But the irony cuts deeper than he lets on. Next.js is MIT licensed. Cloudflare's vinext did not violate any license—it did exactly what Ronacher calls a contribution to the culture of openness, applied to a permissively licensed codebase. Vercel's reaction had nothing to do with license infringement; it was purely competitive and territorial. The implicit position is: reimplementing GPL software as MIT is a victory for sharing, but having our own MIT software reimplemented by a competitor is cause for outrage. This is what the claim that permissive licensing is “more share-friendly” than copyleft looks like in practice. The spirit of sharing, it turns out, runs in one direction only: outward from oneself.
 Ronacher registers the contradiction and does not stop. “This development plays into my worldview,” he writes. When you present evidence that cuts against your own position, acknowledge it, and then proceed to your original conclusion unchanged, that is a signal that the conclusion preceded the argument.
 ## Legality and social legitimacy are different registers
 Back to the question posed at the start. Is legal the same as legitimate?
 Antirez closes his careful legal analysis as though it settles the matter. Ronacher acknowledges that “there is an obvious moral question here, but that isn't necessarily what I'm interested in.” Both pieces treat legal permissibility as a proxy for social legitimacy. But law only says what conduct it will not prevent—it does not certify that conduct as right. Aggressive tax minimization that never crosses into illegality may still be widely regarded as antisocial. A pharmaceutical company that legally acquires a patent on a long-generic drug and raises the price a hundredfold has done something legal, but that does not make it fine. Legality is a necessary condition; it is not a sufficient one.
 In the chardet case, the distinction is sharper still. What the LGPL protected was not Blanchard's labor alone. It was a social compact agreed to by everyone who contributed to the library over twelve years. The terms of that compact were: if you take this and build on it, you share back under the same terms. This compact operated as a legal instrument, yes, but it was also the foundation of trust that made contribution rational. The fact that a reimplementation may qualify legally as a new work, and the fact that it breaks faith with the original contributors, are separate questions. If a court eventually rules in Blanchard's favor, that ruling will tell us what the law permits. It will not tell us that the act was right.
 Zoë Kooyman, executive director of the FSF, put it plainly: “Refusing to grant others the rights you yourself received as a user is highly antisocial, no matter what method you use.”
 ## Whose perspective is the default?
 Reading this debate, I keep returning to a question about position. From where are these two writers looking at the situation?
 Antirez created Redis. Ronacher created Flask. Both are figures at the center of the open source ecosystem, with large audiences and well-established reputations. For them, falling costs of AI reimplementation means something specific: it is easier to reimplement things they want in a different form. Ronacher says explicitly that he had begun reimplementing GNU Readline precisely because of its copyleft terms.
 For the people who have spent years contributing to a library like chardet, the same shift in costs means something else entirely: the copyleft protection around their contributions can be removed. The two writers are speaking from the former position to people in the latter, telling them that this was always lawful, that historical precedent supports it, and that the appropriate response is adaptation.
 When positional asymmetry of this kind is ignored, and the argument is presented as universal analysis, what you get is not analysis but rationalization. Both writers arrive at conclusions that align precisely with their own interests. Readers should hold that fact in mind.
 ## What this fight points toward
 Bruce Perens, who wrote the original Open Source Definition, told The Register: “The entire economics of software development are dead, gone, over, kaput!” He meant it as an alarm. Antirez, from a similar assessment of the situation, draws the conclusion: adapt. Ronacher says he finds the direction exciting.
 None of the three responses addresses the central question. When copyleft becomes technically easier to circumvent, does that make it less necessary, or more?
 I think more. What the GPL protected was not the scarcity of code but the freedom of users. The fact that producing code has become cheaper does not make it acceptable to use that code as a vehicle for eroding freedom. If anything, as the friction of reimplementation disappears, so does the friction of stripping copyleft from anything left exposed. The erosion of enforcement capacity is a legal problem. It does not touch the underlying normative judgment.
 That judgment is this: those who take from the commons owe something back to the commons. The principle does not change depending on whether a reimplementation takes five years or five days. No court ruling on AI-generated code will alter its social weight.
 This is where law and community norms diverge. Law is made slowly, after the fact, reflecting existing power arrangements. The norms that open source communities built over decades did not wait for court approval. People chose the GPL when the law offered them no guarantee of its enforcement, because it expressed the values of the communities they wanted to belong to. Those values do not expire when the law changes.
 In previous writing, I argued for a training copyleft (TGPL) as the next step in this line of development. The chardet situation suggests the argument has to go further: to a specification copyleft covering the layer below source code. If source code can now be generated from a specification, the specification is where the essential intellectual content of a GPL project resides. Blanchard's own claim—that he worked only from the test suite and API without reading the source—is, paradoxically, an argument for protecting that test suite and API specification under copyleft terms.
 The history of the GPL is the history of licensing tools evolving in response to new forms of exploitation: GPLv2 to GPLv3, then AGPL. What drove each evolution was not a court ruling but a community reaching a value judgment first and then seeking legal instruments to express it. The same sequence is available now. Whatever courts eventually decide about AI reimplementation, the question we need to answer first is not a legal one. It is a social one. Do those who take from the commons owe something back? I think they do. That judgment does not require a verdict.
 What makes the pieces by antirez and Ronacher worth reading is not that they are right. It is that they make visible, with unusual clarity, what they are choosing not to see. When legality is used as a substitute for a value judgment, the question that actually matters gets buried in the footnotes of a law it has already outgrown.
--- a/README.md
+++ b/README.md
@@ -0,0 +1,106 @@
 # Claude Code Python Porting Workspace
 > The primary `src/` tree in this repository is now dedicated to **Python porting work**. The March 31, 2026 Claude Code source exposure is part of the project's background, but the tracked repository is now centered on Python source rather than the exposed TypeScript snapshot.
 ---
 ## Porting Status
 The main source tree is now Python-first.
 - `src/` contains the active Python porting workspace
 - `tests/` verifies the current Python workspace
 - the exposed snapshot is no longer part of the tracked repository state
 The current Python workspace is not yet a complete one-to-one replacement for the original system, but the primary implementation surface is now Python.
 ## Why this rewrite exists
 I originally studied the exposed codebase to understand its harness, tool wiring, and agent workflow. After spending more time with the legal and ethical questions—and after reading the essay linked below—I did not want the exposed snapshot itself to remain the main tracked source tree.
 This repository now focuses on Python porting work instead.
 ## Repository Layout
 ```text
 .
 ├── src/                                # Python porting workspace
 │   ├── __init__.py
 │   ├── commands.py
 │   ├── main.py
 │   ├── models.py
 │   ├── port_manifest.py
 │   ├── query_engine.py
 │   ├── task.py
 │   └── tools.py
 ├── tests/                              # Python verification
 ├── assets/omx/                         # OmX workflow screenshots
 ├── 2026-03-09-is-legal-the-same-as-legitimate-ai-reimplementation-and-the-erosion-of-copyleft.md
 └── README.md
 ```
 ## Python Workspace Overview
 The new Python `src/` tree currently provides:
 - **`port_manifest.py`** — summarizes the current Python workspace structure
 - **`models.py`** — dataclasses for subsystems, modules, and backlog state
 - **`commands.py`** — Python-side command port metadata
 - **`tools.py`** — Python-side tool port metadata
 - **`query_engine.py`** — renders a Python porting summary from the active workspace
 - **`main.py`** — a CLI entrypoint for manifest and summary output
 ## Quickstart
 Render the Python porting summary:
 ```bash
 python3 -m src.main summary
 ```
 Print the current Python workspace manifest:
 ```bash
 python3 -m src.main manifest
 ```
 List the current Python modules:
 ```bash
 python3 -m src.main subsystems --limit 16
 ```
 Run verification:
 ```bash
 python3 -m unittest discover -s tests -v
 ```
 ## Related Essay
 - [*Is legal the same as legitimate: AI reimplementation and the erosion of copyleft*](https://writings.hongminhee.org/2026/03/legal-vs-legitimate/)
 The essay is dated **March 9, 2026**, so it should be read as companion analysis that predates the **March 31, 2026** source exposure that motivated this rewrite direction.
 ## Built with `oh-my-codex`
 The restructuring and documentation work on this repository was AI-assisted and orchestrated with Yeachan Heo's [oh-my-codex (OmX)](https://github.com/Yeachan-Heo/oh-my-codex), layered on top of Codex.
 - **`$team` mode:** used for coordinated parallel review and architectural feedback
 - **`$ralph` mode:** used for persistent execution, verification, and completion discipline
 - **Codex-driven workflow:** used to turn the main `src/` tree into a Python-first porting workspace
 ### OmX workflow screenshots
 ![OmX workflow screenshot 1](assets/omx/omx-readme-review-1.png)
 *Ralph/team orchestration view while the README and essay context were being reviewed in terminal panes.*
 ![OmX workflow screenshot 2](assets/omx/omx-readme-review-2.png)
 *Split-pane review and verification flow during the final README wording pass.*
 ## Ownership / Affiliation Disclaimer
 - This repository does **not** claim ownership of the original Claude Code source material.
 - This repository is **not affiliated with, endorsed by, or maintained by Anthropic**.
--- a/assets/omx/omx-readme-review-1.png
+++ b/assets/omx/omx-readme-review-1.png
--- a/assets/omx/omx-readme-review-2.png
+++ b/assets/omx/omx-readme-review-2.png
--- a/src/init.py
+++ b/src/init.py
@@ -0,0 +1,16 @@
 """Python porting workspace for the Claude Code rewrite effort."""
 from .commands import PORTED_COMMANDS, build_command_backlog
 from .port_manifest import PortManifest, build_port_manifest
 from .query_engine import QueryEnginePort
 from .tools import PORTED_TOOLS, build_tool_backlog
 __all__ = [
    'PortManifest',
    'QueryEnginePort',
    'PORTED_COMMANDS',
    'PORTED_TOOLS',
    'build_command_backlog',
    'build_port_manifest',
    'build_tool_backlog',
 ]
--- a/src/commands.py
+++ b/src/commands.py
@@ -0,0 +1,13 @@
 from __future__ import annotations
 from .models import PortingBacklog, PortingModule
 PORTED_COMMANDS = (
    PortingModule('main', 'Expose a Python CLI for manifest and backlog reporting', 'src/main.py', 'implemented'),
    PortingModule('summary', 'Render a Markdown overview of the current porting workspace', 'src/query_engine.py', 'implemented'),
    PortingModule('subsystems', 'List the current Python modules participating in the rewrite', 'src/port_manifest.py', 'implemented'),
 )
 def build_command_backlog() -> PortingBacklog:
    return PortingBacklog(title='Command surface', modules=list(PORTED_COMMANDS))
--- a/src/main.py
+++ b/src/main.py
@@ -0,0 +1,38 @@
 from __future__ import annotations
 import argparse
 from .port_manifest import build_port_manifest
 from .query_engine import QueryEnginePort
 def build_parser() -> argparse.ArgumentParser:
    parser = argparse.ArgumentParser(description='Python porting workspace for the Claude Code rewrite effort')
    subparsers = parser.add_subparsers(dest='command', required=True)
    subparsers.add_parser('summary', help='render a Markdown summary of the Python porting workspace')
    subparsers.add_parser('manifest', help='print the current Python workspace manifest')
    list_parser = subparsers.add_parser('subsystems', help='list the current Python modules in the workspace')
    list_parser.add_argument('--limit', type=int, default=16)
    return parser
 def main(argv: list[str] | None = None) -> int:
    parser = build_parser()
    args = parser.parse_args(argv)
    manifest = build_port_manifest()
    if args.command == 'summary':
        print(QueryEnginePort(manifest).render_summary())
        return 0
    if args.command == 'manifest':
        print(manifest.to_markdown())
        return 0
    if args.command == 'subsystems':
        for subsystem in manifest.top_level_modules[: args.limit]:
            print(f'{subsystem.name}	{subsystem.file_count}	{subsystem.notes}')
        return 0
    parser.error(f'unknown command: {args.command}')
    return 2
 if __name__ == '__main__':
    raise SystemExit(main())
--- a/src/models.py
+++ b/src/models.py
@@ -0,0 +1,31 @@
 from __future__ import annotations
 from dataclasses import dataclass, field
@dataclass(frozen=True)
 class Subsystem:
    name: str
    path: str
    file_count: int
    notes: str
@dataclass(frozen=True)
 class PortingModule:
    name: str
    responsibility: str
    source_hint: str
    status: str = 'planned'
@dataclass
 class PortingBacklog:
    title: str
    modules: list[PortingModule] = field(default_factory=list)
    def summary_lines(self) -> list[str]:
        return [
            f'- {module.name} [{module.status}] — {module.responsibility} (from {module.source_hint})'
            for module in self.modules
        ]
--- a/src/port_manifest.py
+++ b/src/port_manifest.py
@@ -0,0 +1,52 @@
 from __future__ import annotations
 from collections import Counter
 from dataclasses import dataclass
 from pathlib import Path
 from .models import Subsystem
 DEFAULT_SRC_ROOT = Path(__file__).resolve().parent
@dataclass(frozen=True)
 class PortManifest:
    src_root: Path
    total_python_files: int
    top_level_modules: tuple[Subsystem, ...]
    def to_markdown(self) -> str:
        lines = [
            f'Port root: `{self.src_root}`',
            f'Total Python files: **{self.total_python_files}**',
            '',
            'Top-level Python modules:',
        ]
        for module in self.top_level_modules:
            lines.append(f'- `{module.name}` ({module.file_count} files) — {module.notes}')
        return '\n'.join(lines)
 def build_port_manifest(src_root: Path | None = None) -> PortManifest:
    root = src_root or DEFAULT_SRC_ROOT
    files = [path for path in root.rglob('*.py') if path.is_file()]
    counter = Counter(
        path.relative_to(root).parts[0] if len(path.relative_to(root).parts) > 1 else path.name
        for path in files
        if path.name != '__pycache__'
    )
    notes = {
        '__init__.py': 'package export surface',
        'main.py': 'CLI entrypoint',
        'port_manifest.py': 'workspace manifest generation',
        'query_engine.py': 'port orchestration summary layer',
        'commands.py': 'command backlog metadata',
        'tools.py': 'tool backlog metadata',
        'models.py': 'shared dataclasses',
        'task.py': 'task-level planning structures',
    }
    modules = tuple(
        Subsystem(name=name, path=f'src/{name}', file_count=count, notes=notes.get(name, 'Python port support module'))
        for name, count in counter.most_common()
    )
    return PortManifest(src_root=root, total_python_files=len(files), top_level_modules=modules)
--- a/src/query_engine.py
+++ b/src/query_engine.py
@@ -0,0 +1,32 @@
 from __future__ import annotations
 from dataclasses import dataclass
 from .commands import build_command_backlog
 from .port_manifest import PortManifest, build_port_manifest
 from .tools import build_tool_backlog
@dataclass
 class QueryEnginePort:
    manifest: PortManifest
    @classmethod
    def from_workspace(cls) -> 'QueryEnginePort':
        return cls(manifest=build_port_manifest())
    def render_summary(self) -> str:
        command_backlog = build_command_backlog()
        tool_backlog = build_tool_backlog()
        sections = [
            '# Python Porting Workspace Summary',
            '',
            self.manifest.to_markdown(),
            '',
            f'{command_backlog.title}:',
            *command_backlog.summary_lines(),
            '',
            f'{tool_backlog.title}:',
            *tool_backlog.summary_lines(),
        ]
        return '\n'.join(sections)
--- a/src/task.py
+++ b/src/task.py
@@ -0,0 +1,10 @@
 from __future__ import annotations
 from dataclasses import dataclass
@dataclass(frozen=True)
 class PortingTask:
    title: str
    detail: str
    completed: bool = False
--- a/src/tools.py
+++ b/src/tools.py
@@ -0,0 +1,13 @@
 from __future__ import annotations
 from .models import PortingBacklog, PortingModule
 PORTED_TOOLS = (
    PortingModule('port_manifest', 'Inspect the active Python source tree and summarize the current rewrite surface', 'src/port_manifest.py', 'implemented'),
    PortingModule('backlog_models', 'Represent subsystem and backlog metadata as Python dataclasses', 'src/models.py', 'implemented'),
    PortingModule('query_engine', 'Coordinate Python-facing rewrite summaries and reporting', 'src/query_engine.py', 'implemented'),
 )
 def build_tool_backlog() -> PortingBacklog:
    return PortingBacklog(title='Tool surface', modules=list(PORTED_TOOLS))
--- a/tests/test_porting_workspace.py
+++ b/tests/test_porting_workspace.py
@@ -0,0 +1,34 @@
 from __future__ import annotations
 import subprocess
 import sys
 import unittest
 from src.port_manifest import build_port_manifest
 from src.query_engine import QueryEnginePort
 class PortingWorkspaceTests(unittest.TestCase):
    def test_manifest_counts_python_files(self) -> None:
        manifest = build_port_manifest()
        self.assertGreaterEqual(manifest.total_python_files, 7)
        self.assertTrue(manifest.top_level_modules)
    def test_query_engine_summary_mentions_workspace(self) -> None:
        summary = QueryEnginePort.from_workspace().render_summary()
        self.assertIn('Python Porting Workspace Summary', summary)
        self.assertIn('Command surface', summary)
        self.assertIn('Tool surface', summary)
    def test_cli_summary_runs(self) -> None:
        result = subprocess.run(
            [sys.executable, '-m', 'src.main', 'summary'],
            check=True,
            capture_output=True,
            text=True,
        )
        self.assertIn('Python Porting Workspace Summary', result.stdout)
 if __name__ == '__main__':
    unittest.main()