Open
Description
Hello,
I recently tried out your project and it works pretty well.
However when applying the tool to parts of the linux-kernel as test I noticed that there seems to be an issue where a lot of .cc and .h files do not get a license assigned. The Problem seems to be related to the way comments are marked in cpp opposed to go or python (characters after the license name).
How to reproduce
- Create a file containing only:
/* SPDX-License-Identifier: GPL-2.0 */ - Use lc on that file. Result should be NOASSERTION
In contrast:
- remove everything after GPL-2.0
- Result should be GPL-2.0
Problem
- Problem lies in parsers/guesser.go
- The RegExp chooses GPL-2.0 */ instead of GPL-2.0
- Comparing the License leads to false as the string are not equal
Solution (-Attempt)
- Adapt the RegExp. Probably tricky? Or I just suck at them (which is a fact).
- Use another string comparison method. I tested with strings.Contains which seems to work. I am not 100% sure however if this screws up some very similar named licenses. I didnt see any but there might be license names which completely contain other license name. This does however fix the minimal sample from above.