TextMate – Introduction to Language Grammars: How to add source code syntax highlighting embedded in HTML
I’ve blogged about TextMate a few times in the past, and with good reason – it’s an extremely versatile, light weight, powerful text editor for the Mac. One great feature of TextMate is its extreme customizability. Today I’m going to show how to modify one of the TextMate language files in order to add support for Java code within HTML text.
Why is this useful? My workflow for producing blog posts is often to write the post in TextMate using the Markdown markup language, which I then convert to HTML. WordPress has the ability to syntax highlight and provide a nice monospaced version of sourcecode within a post if it’s delimited by <code></code> tags. While the sourcecode comes out fine in the final post, it would be nice to have the syntax highlighting show up from within the Markdown view (i.e. while I am composing a blog post). Let’s get started by looking at how language grammars work in TextMate.
Introduction to Language Grammar Editing
The language support in TextMate is extremely powerful, but it’s a little complicated to get started. In essence, a language defines a series of rules mapping patterns to scopes. For instance, the Java language grammar defines a scope for comments, a scope for control characters, and so on and so forth. The scope is extremely important for many reasons. A few of them are
- The scope determines whether text is spellchecked or not (a top level scope of
source
is not spell checked; one that istext
will be) - It provides syntax highlighting, as certain scopes are associated with certain colors.
- Snippets can be targeted to only run when within a certain scope. (See this article on Scope selectors for more.) For instance, all the Java snippets are defined as only being active in the
source.java
scope.
As an aside, you might wonder why the scope is called source.java
as opposed to java.scope
. The reason is that some scope selectors can target the more general case (scope
), whereas those concerned with java can target the more specific scope (java.scope
).
Since someone has already done the hard work of creating a language definition for Java and for creating all of the snippets that support it, we want to leverage this body of work. All we need to do is ensure that text between the java
tags is considered to be part of the source.java scope, and everything will just work.
First, let us look at a sample grammar file. Open up the HTML language definition file by going to Bundles -> Bundle Editor -> Edit Languages, or via the shortcut ⌃ ⌥ ⌘L, and choose the HTML option. You’ll be presented with a rather inscrutable, unstyled document to the right. The first thing you should do, and which I found out the hard way, is copy all that text and paste it into a new document.
When you paste the text into the document, the text is unstyled and interpreted as plain text. In order to force TextMate to interpret this as a language grammar, you must click the item in the lower middle that says “Plain Text” and choose “Language Grammar” from the dropdown box. The document should look a lot nicer after this step:
Take a look through the grammar, but don’t get bogged down in the details. The important thing to look at is the list of patterns defined. Here’s just a small section:
patterns = ( { name = 'meta.tag.any.html'; begin = '(]*>)'; end = '(>()'; beginCaptures = { 1 = { name = 'punctuation.definition.tag.html'; }; 2 = { name = 'entity.name.tag.html'; }; }; endCaptures = { 1 = { name = 'punctuation.definition.tag.html'; }; 2 = { name = 'meta.scope.between-tag-pair.html'; }; 3 = { name = 'entity.name.tag.html'; }; 4 = { name = 'punctuation.definition.tag.html'; }; }; patterns = ( { include = '#tag-stuff'; } ); }
This is the first pattern that will attempt to match. You don’t need to understand all of it, but you should understand that the parentheses in the regular expressions denote capturing groups, which are then referenced in the beginCaptures
and endCaptures
tags. These assign scopes to the various captured groups. Note too that we can recursively include patterns (via the include = '#tag-stuff'
line) which assign scope to various parts of the matched text. This allows us to define a pattern one time and reference it in multiple places, which cuts down on code duplications.
If you look through the HTML grammar, you’ll notice that some embedded code is automatically detected and set to have the matching text use the corresponding language:
ruby = { patterns = ( { name = 'comment.block.erb'; begin = ''; captures = { 0 = { name = 'punctuation.definition.comment.erb'; }; }; },
Here, any times the <%# %>
tag pair is seen, the entire block is captured and assigned to the scope punctuation.definition.comment.erb
, which has the effect of distinguishing it from surrounding text. You can see this in action in the following screenshot:
In addition to the fact that the ERB snippet is syntax highlighted, take note of the popup in the screenshot showing “text.html.basic” and “comment.block.erb”. At any point in any TextMate file, you can hit ⌃ ⇧P (Control Shift P) to get the current scope of the cursor. This is extremely useful for debugging why certain elements are not being selected or assigned the scope you think they are.
Adding Java support
While using a TextMate window to edit the grammar is extremely nice, unfortunately you cannot test your changes interactively here. You must copy and paste the contents back to the original grammar window, overwriting the contents, and then press Test. This will reload the grammar and you will see the change reflected in any window using that grammar currently.
With that in mind, let’s add the support for embedding Java within our Markdown blog posts.
The basic pattern is pretty simple:
{ name = 'source.java'; comment = 'Use Java grammar'; begin = '\'; end = '\[/sourcecode\]'; patterns = ( { include = 'source.java'; } ); }</pre> </div> I look for the literal string <code></code> to start the pattern, and then the literal string <code>to end it. I have to escape the brackets due to the fact that they have a special meaning within regular expressions (
[aeiou]
matches any vowel, while\[aeiou\]
matches the literal string[aeiou]
).By adding this line to the top of the patterns, it is run before any of the others. (Remember, we have to actually add it to the HTML grammar within the Bundle Editor, not just the TextMate window with the grammar inside of it). Once the line is added and you press Test, the Java highlighting beings to work.
Here’s what a snippet of Java embedded in a Markdown blog post looked like without this change:
And after:
Conclusion
Language support in TextMate is a very complex task, and one that cannot be adequately covered in a single post. I’ve shown here how to add a small snippet to the HTML grammar to allow syntax highlighting of sourcecode delimited by special blocks. This technique could be expanded to support any number of other programming languages.
The ability to customize TextMate through editing snippets and language grammars makes it extremely powerful. I hope this has only whetted your appetite to learn more. If it has, please see the macromates site which has more information about this.
Advertisement
Leave a Reply Cancel reply
Top Posts
- Three ways of creating dictionaries in Python
- Hello {planet_name}: Creating strings with dynamic content in Python
- TextMate - Introduction to Language Grammars: How to add source code syntax highlighting embedded in HTML
- Unix tip #1: advanced mkdir and brace expansion fun
- About
- Contact
- How to make git use TextMate as the default commit editor
- Python Gotcha #1: Default arguments and mutable data structures
- The Best iPhone Guitar Fretboard App: Usability Lessons Learned
- NetBeans Platform Tip #2: Persisting state in TopComponents
Tags
amazon android annoyance apple bash blog book book review bug car talk color command line data dom ebook eventbus find functional functional programming git gmail golang google gotcha graphics grep GUI html iPhone iPod java javascript js json lego library mac map meta mule mysql netbeans netbeans platform open source oreilly p2pu productivity programming puzzler python R refactoring review scala scripting search sed shell stats swing testing textmate tr tufte UI unit testing unix usability user interface video web web design wordpress workaround xmlCategories
- .net (2)
- Android (7)
- Apple (9)
- data (4)
- eclipse (1)
- Gaming (1)
- git (1)
- go (3)
- hibernate (1)
- iPad (3)
- iPhone (6)
- iPod (2)
- Java (48)
- javascript (6)
- LEGO (7)
- link (17)
- mule (3)
- music (1)
- mysql (1)
- NetBeans (15)
- NetBeans Platform (8)
- open source (4)
- photo (6)
- programming (54)
- Python (18)
- quote (9)
- R (3)
- regular (20)
- scala (7)
- svn (1)
- textmate (4)
- UI (32)
- Uncategorized (72)
- unix (13)
- user interface (2)
- video (3)
Google+
Follow on twitter
Search the site
Archives
- September 2022 (1)
- June 2021 (1)
- September 2019 (1)
- March 2018 (1)
- August 2017 (1)
- May 2017 (1)
- March 2015 (2)
- February 2015 (3)
- January 2015 (2)
- December 2014 (1)
- November 2014 (1)
- October 2014 (3)
- September 2014 (6)
- August 2014 (1)
- June 2014 (1)
- May 2014 (2)
- April 2014 (3)
- March 2014 (2)
- February 2014 (4)
- January 2014 (5)
- December 2013 (1)
- October 2013 (2)
- September 2013 (1)
- February 2013 (3)
- December 2012 (1)
- November 2012 (1)
- October 2012 (2)
- September 2012 (1)
- August 2012 (2)
- July 2012 (1)
- June 2012 (1)
- May 2012 (3)
- March 2012 (4)
- February 2012 (2)
- September 2011 (1)
- August 2011 (1)
- July 2011 (1)
- June 2011 (2)
- May 2011 (7)
- April 2011 (6)
- March 2011 (3)
- February 2011 (8)
- January 2011 (11)
- December 2010 (7)
- November 2010 (6)
- October 2010 (8)
- September 2010 (5)
- August 2010 (3)
- July 2010 (8)
- June 2010 (7)
- May 2010 (13)
- April 2010 (10)
- March 2010 (2)
- February 2010 (9)
- January 2010 (5)
- December 2009 (2)
- November 2009 (1)
- October 2009 (5)
- September 2009 (3)
- August 2009 (1)
- July 2009 (1)
- June 2009 (4)
- May 2009 (2)
- April 2009 (3)
- March 2009 (6)
- February 2009 (5)
- January 2009 (4)
Looks like it is using Scintilla.
I feel like taking a dump, the question is where to go to do the dump. It is very enjoyable to take a dump.
I can’t find any references to TextMate using Scintilla under the hood. Are you going based off of visual similarities? Or the language grammar?
The page format is pretty broken. Could you fix it?
WordPress really screwed up my formatting. See the original post at https://github.com/I82Much/developmentality-blog-posts/blob/master/completed/Adding%20language%20support.mdown