-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[v2] Language combinations / extensions / embeddings / ... #3927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thinking about this some more:
|
There are almost none. There is |
Nice catch! Now I'm wondering WTF is |
I've been thinking a lot about what's the best way to handle language definitions that depend or make use of other languages. Some earlier thoughts in:
$language
for highlighting languages embedded in other languages #3923I have a strong hunch that these are all facets of the same problem and a good API design will minimize the number of separate solutions for each of them, so I'm going to close all three so we can discuss them holistically here.
Note
This is a work in progress and I will update it as I think more about this problem.
But before I go into the weeds, an illustration (real screenshot of our code, taken from VS Code):

This is 4 languages nested in each other!
Use cases
There are currently two types of dependencies:
And several types of use cases described below.
_Note: Any mention of "now" refers to the
simplify
branch (draft PR) and not the currentv2
branch.1. Languages using another language as a base (e.g. JavaScript using C-like)
This is the most straightforward case: just simple inheritance.
base
key instead of an imperativeextend()
call (I wonder ifparent
orextends
may be better names) and is considered a required dependency.grammar()
function via abase
key that resolves synchronously.$merge
does a deep merge of certain tokens instead$insertBefore
inserts certain tokens before another$insertAfter
inserts certain tokens after another$insert
is a shorthand version of$insertBefore
/$insertAfter
that is better suited to one-off inserts as the position is specified inside its value via$before
/$after
Usually, the base language is useful on its own. E.g.
clike
was not just created to make its child languages more DRY, but to have something to fall back on when one wanted to highlight a C-like language that did not have a dedicated language definition (admittedly far more important when Prism first launched with like 5 languages compared to now).These days, there are also cases where language definitions exist for the sole reason of making other language definitions more DRY (such as
javadoclike
, which is a perversion of the concept. No language should be registered and become available as alanguage-xxx
class if it's not useful on its own, otherwise it's not actually a language, it's a shared utility.2. Languages embedding/embedded in other languages
This can be broken down into two major categories:
<script>
elements)markdown
code block or when highlightinghttp
requests)1 was already handled by special casing strings as values of
$rest
/$inside
, but 2 was severely problematic and required a fair bit of custom code. #3923 proposes a$language
descriptor that can handle both, by taking a function as well which takes named capturing groups as a parameter. I'm still unsure if this is a good solution.There is also the question of what types of dependencies these are: are they optional or required? It seems like it could go either way, depending on the user's goal, but I'm leaning towards required. But then, for 2, does that mean that now your required dependencies depend on the code being matched?
Perhaps these are actually the only true "optional" dependencies, and there should be a way for Prism users to autoload these as well. In that case, perhaps grammars should support async nodes for these that resolve when they are loaded. The way code is tokenized could easily support parts of it being deferred for later.
3. Languages that are preprocessors for other languages
Example: PHP or Liquid are HTML preprocessors.
This is what #3911 was about. It is further complicated by the fact that these preprocessors could often generate anything, but definitely do have defaults (usually
markup
). This is the prime reason custom tokenizers exist, which I would love to get rid of.I no longer think most of #3911 was a good idea, but there is one part that I still think was: languages being able to declare what language they produce, and have that be overridable via two-id language classes (e.g.
language-diff-css
to highlight a CSS diff orlanguage-liquid-css
for a Liquid template that produces CSS.I'm still unsure how exactly these work today, but perhaps a good solution to 2 could also address these (by essentially emulating
$rest: childLanguage
).4. Languages that can make other language definitions "richer" but are not strictly necessary
This one is the hairiest category as it encompasses so many diverse use cases.
Examples:
javastacktrace
extendinglog
markup
is loaded. This one is basically highlighting the need for a shared utility fortag
.http
being highlighted as JSON if that is loaded, or as JS otherwise. That seems to be a bonafide optional dependency.markdown
ingraphql
comments being highlighted if it is loaded. That seems to be a bonafide optional dependency.jsdoc
in JSdoc-comment
tokens is highlighted ifjsdoc
is loaded. That seems to be a bonafide optional dependency.js-templates
extending JS with the ability to highlight template literals tagged with a certain language. Not everyone highlighting JS wants to highlight tagged template literals, but since JS is the host language, it cannot belanguage-js-templates
that activates this functionality.opencl-extensions
extending C and C++. Not everyone highlighting C/C++ wants this.css-extras
extendingcss
with specialized tokens for selectors etc. Not everyone highlighting CSS wants the granularity ofcss-extras
.Languages should not modify other languages
Previously, there were more of these, which existed for the sole reason of reducing bundle sizes to the extreme (like saving 1KB). These are now eliminated. The ones that remain are those that fundamentally should involve user choice, as described above.
The toughest of all are those like the last three, which are also currently the only uses for
extends
(#3911). Languages extending other languages are deeply problematic:Optional dependencies beyond actual embedding are a smell
Even in use cases that are "proper" optional dependencies, it feels that this logic should really live with the child language. But …if it does, that would mean the child language modifies the parent language, which, as described above, is evil!
Not necessarily. Languages modifying other languages was one way to do it. What if there are others?
Essentially, in all of these, we have one language adding granularity to another. In most of them, we don't want users to have to opt-in separately for every use, so languages modifying other languages was invented as a solution to that. E.g. you may want all your CSS examples to be highlighted with the granularity of
css-extras
, and it would be annoying (and incompatible) to have to specifylanguage-css-extras
each time. In many an opt-in doesn't make much sense at all, and it's really about not bloating the bundle size. E.g. of course you want to highlight JS in HTML if you have a JS language definition loaded.Ideas
These are currently mainly for 4. I have some ideas for the rest, but it's 4 that is the hairiest.
1. Language aliases
We could extend the concept of language aliases to existing languages. Then e.g.
css-extras
could be defined as just regular inheritance overcss
and one could aliascss
tocss-extras
.2. Language extensions layered on top of existing languages without modifying them
Instead of language extensions actually mutating the host language, what if languages could declare that they are automatically applied within certain tokens of other languages?
3. Language modifications with defined ordering
Languages like
css-extras
are never autoloaded, right? They need to be explicitly loaded …somehow. So perhaps the ordering effects go away on their own in v2, simply because loading order is much more well defined.Additionally, we could soften the blow by making it configurable with a Prism config option for how to handle
extends
languages:language-css-extras
explicitly to usecss-extras
In fact, we could create the new language definition anyway.
The text was updated successfully, but these errors were encountered: