Adding REUSE data/license headers to files

PJB · June 18, 2025, 10:39pm

We have plenty of downstreams that want to develop their servers under a different license from upstream. In the interest of making this process less legally dubious, it would probably be best if we started properly annotating upstream files with license info per-file.

The current whispers I’m reading is that REUSE is the tool for the job. Though there’s (currently) one thing I’m unsure about: do we need to backfill all the copyright information (somehow..?) or can we just start every file off as “Space Station 14 Contributors” and call it a day?

PJB · June 18, 2025, 10:48pm

The REUSE FAQ says that you really should list all copyright holders/authors properly, even if you want to just go with the “contributors” catch-all.

On the other hand, projects like Rust which I totally trust to be sanely developed seemingly do not do this.

Personally, keeping track of copyright per-file for every author seems an utterly futile and impractical affair, given the frequency of code refactors and similar involved. So I’m not sure this actually makes sense.

aquif · June 18, 2025, 10:56pm

Most open source copyright has never been tested in court. I remember reading on I think the reuse website (I can’t find it now) that git history does not hold up in court which is why they recommend having an AUTHORS file if you don’t do per-file copyright. I honestly don’t think the per-file copyright is that bad or ugly and you can set up rider to handle it automatically.

PJB · June 18, 2025, 10:58pm

The problem to me seems that it’s completely impossible to actually maintain.

If I move code between two files, do I now need to look into the git history to see who originally created this code (which could be a dozen authors) and then splice only those authorship lines to the new file? This is an utterly impractical affair that will result in constant misattribution and seriously slow down development.

And I’m not sure it’s necessary. The MIT license doesn’t require credit, only for the license text to be properly included. Which it still would be.

crazybrain · June 18, 2025, 11:06pm

The copyright holder SHOULD be an individual, list of individuals, group, legal entity, or any other descriptor by which one can easily identify the copyright holder(s).

PJB · June 18, 2025, 11:08pm

Yes, I read that. Except I’ve already addressed above how this would be both utterly impossible to satisfy, and seemingly not something other respectable projects care about. So what gives?

deltanedas · June 18, 2025, 11:10pm

you are correct the mit license requires no attribution

if you want to do per file stuff add a mit header to everything and have a script to update them on new years

crazybrain · June 18, 2025, 11:13pm

I feel the listing of individuals only works when there’s no more than a few dozen contributors, but we have quite a few more than that. Remember, most open source projects aren’t this big.

Edit: we have about as many contributors as Python does.

crazybrain · June 18, 2025, 11:26pm

Which means any downsteam using a specific commit hash (or date) to what files are what licence won’t hold up in court because it’s based on commit history?

Bhijn · June 18, 2025, 11:28pm

As for file headers: a license/copyright notice header (whether SPDX or verbatim copy-paste) would be a good idea at bare minimum, as this is something that almost all open source projects (and even software companies with closed-source projects) do. Listing off all individual contributors and keeping that up to date would be a sisyphean task compared to just requiring that files have some form of valid license header (which here for upstream would almost always be that current copyright notice and accompanying MIT license).

deltanedas · June 19, 2025, 12:04am

i wonder who told you that but anyone using history like that would just be for historical use
if they happened to use a viral license like gpl or agpl, the entire projects code would use it not just a few files

deltanedas · June 19, 2025, 12:05am

anyway an spdx header would be as simple as // SPDX-License-Identifier: MIT

PJB · June 19, 2025, 12:08am

The entire project, when taking together, would be subject to the terms of both licenses. This does not mean all the code in the project falls under the AGPL, and saying as such mislabels the copyright and license of said files.

deltanedas · June 19, 2025, 12:11am

which the mit license has no issue with, theres no requirement to attribute individual files
and all you have to do to remain compliant with both is not delete the copy of the mit license

PJB · June 19, 2025, 12:26am

Except you have now utterly demolished the ability to properly tell what code is even copyrighted to what, thereby making it impossible to properly work with the code and licensing thereof in the future. This may be practical and sufficient if the code never leaves your repo again, but we both know that’s not how SS13’s development works, and it’s a grossly irresponsible way to go about development.

deltanedas · June 19, 2025, 12:39am

any halfway serious fork has its code in a dedicated module or at least subdirectory

PJB · June 19, 2025, 12:41am

This is clearly not the licensing method many downstreams use, as their readme gives a vague “code after this date is AGPL”. I do not see why this point is relevant. If it was, it would be trivial for downstreams to comply already and there wouldn’t be an issue.

GoobStation · June 19, 2025, 7:00am

REUSE has a number of issues:

Would contributors making trivial changes (e.g. two characters) be credited as copyright holders?
Does refactored or rewritten code still list original authors, if not, how can we ensure that the refactored code doesn’t violate clean space implementations?
Going along with that, how do we actually determine when code has been changed enough to remove original authors?
If code is split, merged, or moved across files, do we have to copy over headers?
How do we retroactively add headers? do we manually go through commit history, and determine if someone made a meaningful enough contribution to be listed as an author?
Do we blanket all files as “Space Wizard Federation” or “Space Station 14 Contributors”?
Files that do no support headers are supposed to have a secondary .LICENSE file in the same directory as it. Would this be implemented, or are RSI and sound configs current copyright specifiers enough?

My solution on Goob was to have a script that retroactively goes through and uses git blame to generate headers, and then set all core content files to MIT + AGPL (sublicense), and all fork folders to just AGPL.

This, then supplemented with a workflow that automatically adds headers to new PRs, and allows users to specify a secondary license if they wish to DUAL (not sub) license their code alongside AGPL on explicitly new files, with the intention to make it easier for forks to port content. And requiring a PR to be manually made with permission from relevant copyright holders to relicense previously made content.

This is not perfect, but it’s a good start.
The only “real” solution that covers all of this is to write a strict guidelines for headers, and manually going through previous files and adding them, and requiring all new PRs to follow the guidelines, but this is not realistically feasible.

Simyon · June 19, 2025, 12:40pm

My one concern regarding adding license headers to all of our files is that it would absolutely fuck the git history when using the GitHub UI to check when a file was last touched and who / what happened to it.

PJB · June 19, 2025, 1:48pm

This isn’t really a problem. We gotta do what we gotta do. If you want to find out who/what you should always be using blame anyways.