HTML/CSS tips to reduce use of JavaScript

June 3, 2021 i82much Leave a comment

https://calendar.perfplanet.com/2020/html-and-css-techniques-to-reduce-your-javascript/

Great article with examples of uses of JacaScript and how they can be replaced with HTML or CSS. I had no idea these techniques were available

Categories: Uncategorized Tags: css, html, javascript, web design

Using BeautifulSoup to extract WordPress.com blog post metadata

May 9, 2014 i82much 1 comment

I want to analyze the popularity of my posts in order to better understand which topics are important to my audience. In my last post about the topic, I showed how to retrieve viewership data about your WordPress.com blog. By itself this data doesn’t tell you much. You can get high level view of the popularity of a blog over time, as well as the traffic for each post. I wanted to go a bit deeper and pull in metadata about the posts themselves, not just their identifiers. This post will show you how to download some raw data and use BeautifulSoup and Python to clean and extract the key metadata.

When faced with a data analysis task, I usually go through the following tasks:

Find the data – what data do you need? Where can you get it?
Extract the data – after you have the raw data, extract meaningful signal from the noise
Clean the data – filter out erroneous or corrupted records
Analyze the data – extract meaning/insight from the data

This post will detail the first two phases.

Find the data

I’m interested in answering questions such as

Do posts about Python get more views, or Java?
Does the time of day I post make a difference?
Do tags matter?
How about the length of a post?

With these questions in mind, I can start to formulate what an ideal data source would look like. In protocol buffer syntax, I’d want something like the following:

message Post {
    // The unique identifier of the post
    optional string id = 1;

    // What was the title of the post?
    optional string title = 2;

    // What is the URL to the post?
    optional string url = 3;

    // Publishing date, in YYYY-MM-DD HH:MM format
    optional string publish_date = 4;

    // How was this post categorized?
    repeated string categories = 5;

    // How was the post tagged?
    repeated string tags = 6;
}

The API I uncovered in my last post does not contain any of this post metadata. Fortunately I found another source – the WordPress admin dashboard of posts. Navigate to https://yourblog.wordpress.com/wp-admin/edit.php or click on the Posts category on the left hand side while logged into the administrator dashboard.

Download the raw data

PostsDashboard

Parsing HTML to extract metadata is not ideal because it is very brittle – if WordPress changes the format of the table containing this data, I would need to rewrite the script that processes it. With no other alternatives, I’m willing to take that chance.

The first step to download the data is to ensure that the table can fit all of your posts; by default it only shows around 10 posts on a page.

Click the “Screen Options” in the upper right corner.

ChangeScreenOptions

Change the number of posts shown to the max (300) and click Apply. If you have more than 300 posts, you’ll have to repeat the rest of this blog post multiple times.

SetTo300

Next, right click on the table and choose Inspect Element (I assume you’re using Chrome; if you’re not, you can just save the entire website as HTML and pick out the table element manually).

InspectTable

Navigate until you find the <table> element; select it. Right click and choose ‘Copy as HTML’

CopyAsHTML

At this point you have the entire set of metadata about your posts as HTML in your clipboard. Create a new file and paste the data into it. Save it somewhere you can find it later; I called mine “all_posts.html”.

Extract the metadata using `BeautifulSoup`

We’ll be using BeautifulSoup, an excellent Python library for parsing HTML and XML files. In brief, it allows us to search a hierarchical document for nodes matching certain criteria and extract data from those nodes.

Here is a table row in the HTML with the location of various pieces of metadata illustrated:

Table row illustration

After installing the library, create a new Python script and import the library, and create a BeautifulSoup object out of the raw text of the HTML document:

from bs4 import BeautifulSoup

def main():
  soup = BeautifulSoup(open("all_posts.html"))

if __name__ == '__main__':
    main()

The BeautifulSoup object allows us to search for our metadata. Let’s start by finding all of the table rows, since they are the location of the data about each post.

# Extract all of the tr id="post" rows.
# <tr id="post-357234106" class="post-357234106 type-post status-publish format-standard hentry category-photo alternate iedit author-self level-0" valign="top">
trs = soup.find_all('tr')

find_all is a key method in the BeautifulSoup API; it allows you to give some criteria and get back a collection of nodes that match. If none are found, it will be return an empty list. The complement of the find_all function is find, which will return the first such node, or None, if none matches.

Next we loop through the table rows, throwing out the ones that don’t have a post ID and thus don’t represent posts.

for tr in trs:
# Only care about the tr's with ids. These represent the posts.
post_id = tr.get('id')
if post_id is None:
    continue

Here we use the get function of the BeautifulSoup API, which allows you to look up attributes of nodes. If the attribute is not present, get returns None. Just like a normal dictionary in Python, you can use the index operation if you’re sure that the key is present. For instance,

post_id = tr['id']

This will yield a KeyError if the key doesn’t exist. If I’m sure that the node has this attribute, this is a good way to extract the data; if I’m not sure then I’ll use get.

With get, I can also provide a default value to use if the key isn’t present:

post_id = tr.get('id', 'fallback_value')

Note that these nodes don’t behave entirely like standard dictionaries. For instance, it’s standard to check for presence of a key in a dictionary as follows:

if 'key' in the_dict:

This won’t work the way you expect for the nodes.

The id of the node contains some extra cruft that we don’t need – namely a ‘post’ prefix. For instance, <tr id="post-456">. Strip off the extra prefix with standard string functions:

post_id = post_id.replace('post-', '')

Next we look for the anchor node underneath the table row which contains the URL of the post. In the table, this always has the text ‘View’. For instance,

<a href="https://developmentality.wordpress.com/2009/03/10/to-write-clean-code-you-must-first-write-dirty-code-and-then-clean-it/" title="View “To write clean code, you must first write dirty code; and then clean&nbsp;it.”" rel="permalink">View</a>

This is simple in BeautifulSoup:

# Get the published URL
url = tr.find('a', text='View')['href']

Here I use find rather than find_all because I expect exactly one such node. I use ['href'] rather than the get syntax because it’s a simple script and I expect all such nodes to have URLs; it’s a fatal error if they don’t.

There is a large hidden div underneath the post table row containing extra meta data about the post, including the publish date. For instance,

<div class="hidden" id="inline_85408649">
    <div class="post_title">To write clean code, you must first write dirty code; and then clean it.</div>
    <div class="post_name">to-write-clean-code-you-must-first-write-dirty-code-and-then-clean-it</div>
    <div class="post_author">881869</div>
    <div class="comment_status">open</div>
    <div class="ping_status">open</div>
    <div class="_status">publish</div>
    <div class="jj">10</div>
    <div class="mm">03</div>
    <div class="aa">2009</div>
    <div class="hh">23</div>
    <div class="mn">18</div>
    <div class="ss">29</div>
    <div class="post_password"></div><div class="post_category" id="category_85408649">196,3099</div><div class="tags_input" id="post_tag_85408649"></div><div class="sticky"></div><div class="post_format"></div></div>

To find the div, we could do something like the following:

divs = tr.find_all('div')
for div in divs:
    if div.get('class') != 'hidden':
        continue
    # we found it

There’s a better way – we can use the class property directly when we use the find or find_all function. We use it as a keyword argument; note that we have to call it class_ rather than class because class is a reserved keyword in Python.

metadata = tr.find('div', class_='hidden')

Once we have this node, we apply the same technique to pull out the title, year, month, and date of publish. The text attribute returns the text of the node.

metadata = tr.find('div', class_='hidden')
title = metadata.find('div', class_='post_title').text
publish_day = metadata.find('div', class_='jj').text
publish_month = metadata.find('div', class_='mm').text
publish_year = metadata.find('div', class_='aa').text
publish_date = '%s-%s-%s' %(publish_year, publish_month, publish_day)

Finally, we pull out the tags and categories of the post, each of which are found in div elements underneath this root hidden div:

# Find the tags, if they're present
tags = []
tags_div = metadata.find('div', class_='tags_input')
if tags_div:
  tags = tags_div.text.split(', ')

# Find the categories - the node should always be present
categories_td = tr.find('td', class_='column-categories')
categories = [x.text for x in categories_td.find_all('a')]

I use a slightly different technique for the tags than the categories because each category is a separate anchor node, as opposed to the tags which are in the text of one node.

After going through this procedure, we have a lot of information about each post. In order to hold the data about each post, we could create a class with the appropriate fields. For now, the class is a simple holder of variables with no behavior attached to it. As such it’s a great candidate for using the namedtuple functionality of the collections library.

import collections
post_metadata = collections.namedtuple('metadata', ['id', 'publish_date', 'title', 'link', 'categories', 'tags'])

This creates an immutable class with the fields I provided. This saves a bunch of boilerplate and automatically implements correct equality and __str__ functions. For instance,

a = post_metadata(id='48586', publish_date='2010-24-26', title='Some Post', link='http://some/link', categories=[], tags=['programming'])
print a
metadata(id='48586', publish_date='2010-24-26', title='Some Post', link='http://some/link', categories=[], tags=['programming'])

For each post table row, we create one such post_metadata instance with all the attributes filled in.

trs = soup.find_all('tr')
posts = []
for tr in trs:
    # 
    data = post_metadata(id=post_id,
                publish_date=publish_date,
                title=title,
                link=url,
                categories=categories,
                tags=tags)
    posts.append(data)

At the end of the script, we now have all the metadata about each post.

metadata(id=u'369876516', publish_date=u'2012-06-09', title=u'Wind Map - a visualization to make Tufte proud', link=u'https://developmentality.wordpress.com/2012/06/09/wind-map-a-visualization-to-make-tufte-proud/', categories=[u'UI'], tags=[u'chart', u'chart junk', u'climate', u'color', u'edward tufte', u'elevation maps', u'hue', u'intensity', u'michael kleber', u'quantitative', u'science', u'tufte', u'UI', u'visualization'])
metadata(id=u'369876270', publish_date=u'2011-04-01', title=u"WordPress Stats April Fool's", link=u'https://developmentality.wordpress.com/2011/04/01/wordpress-stats-april-fools/', categories=[u'Uncategorized'], tags=[u"april fool's", u'wordpress'])
metadata(id=u'369876110', publish_date=u'2011-01-25', title=u'WorkFlowy - free minimalist list webapp', link=u'https://developmentality.wordpress.com/2011/01/25/workflowy-free-minimalist-list-webapp/', categories=[u'UI', u'Uncategorized'], tags=[u'breadcrumb', u'getting things done', u'hierarchy', u'lists', u'nested', u'nodes', u'todo', u'UI', u'webapp', u'workflowy'])
metadata(id=u'80156276', publish_date=u'2009-02-21', title=u'WriteRoom', link=u'https://developmentality.wordpress.com/2009/02/21/writeroom/', categories=[u'link'], tags=[u''])

The last step of today’s post is to output the data as a CSV file. Unfortunately, the standard Python csv module does not handle encoding unicode characters and the table contains unicode. As such we’ll use the UnicodeWriter class that the Python docs include.

columns = ['id', 'publish_date', 'title', 'link', 'categories', 'tags']
post_metadata = collections.namedtuple('metadata', columns)

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
# snip the definition from http://docs.python.org/2/library/csv.html#csv.writer

writer = UnicodeWriter(sys.stdout)
writer.writerow(columns)
for post in posts:
  row =  [post.id, post.publish_date, post.title, post.link, ','.join(post.categories), ','.join(post.tags)]
  writer.writerow(row)

We then invoke the Python script and redirect the output to our csv file. I’ve uploaded a slightly redacted version of the csv file to Google Docs; you can view it here. The final version of the script is available on github.com.

MetadataCsv

In my next post I will show how to join this metadata with the view data we accessed via the API in last week’s post in order to gain insight into which types of posts provide value to readers.

Categories: programming, Python Tags: analyze, beautifulsoup, blog, cleaning data, data analysis, html, metadata, python, scripting, stats, wordpress.com

Pandoc – an essential tool for Markdown users

March 23, 2011 i82much 7 comments

Pandoc is a great tool to convert between various text based formats. For instance, with a single input Markdown file, I can generate an HTML page of that document, a LaTeX document, and a beautifully typeset PDF.

I had troubles installing it on Mac OSX via MacPorts; a simpler solution for me was to download and install the Haskell package and then use the commands:

cabal update
cabal install pandoc

This assumes, of course, that the cabal program that the Haskell package installs is accessible from your path.

The next step for me was to install the excellent Pandoc TextMate bundle. This gives you the standard things like syntax highlighting of your document, as well as a variety of useful snippets. For instance, when I am in Pandoc mode and press ⌃ ⌥ ⌘ P, I get the following popup from which I can easily choose options via mouse or keyboard:

Easy way to preview your document in various output formats

Before you can start using the Pandoc TextMate bundle, you must ensure that the Pandoc executable is on the PATH exposed to TextMate, which is different than your global system path. In other words, just because you can execute pandoc in a shell and have it work, this doesn’t mean it will work in TextMate. For instance, on my computer, Pandoc is located in:

$ which pandoc
/Users/ndunn/Library/Haskell/bin/pandoc

Go to TextMate -> Preferences -> Advanced -> PATH and append :/Users/ndunn/Library/Haskell/bin to the end of the PATH variable.

Appending the Pandoc path to the PATH variable

Pandoc makes a few extensions to the Markdown syntax, which I really like. For instance, you can designate a section of text to be interpreted literally by surrounding it with three ~ characters. Furthermore, you can specify what language the source code is in, and the Pandoc converter will syntax highlight it in the final document (assuming the correct extensions have been installed).

I like this setup because it allows you to specify the language of the block of text, which means that you can force TextMate to interpret it the same way. As I’ve blogged about previously, one can add source code syntax highlighting embedded in HTML documents. I added the following lines to my HTML language grammar in order to have a few different languages recognized and interpreted as source code within these delimited blocks.

Here is the relevant section:

    {   name = 'source.java';
            comment = 'Use Java grammar';
            begin = '~~~\s*{.java}';
            end = '~~~';
            patterns = ( { include = 'source.java'; } );
        },
        {   name = 'text.xml';
            comment = 'Use XML grammar';
            begin = '~~~\s*{.xml}';
            end = '~~~';
            patterns = ( { include = 'text.xml'; } );
        },
        {   name = 'source.shell';
            comment = 'Use Shell grammar';
            begin = '~~~\s*{.shell}';
            end = '~~~';
            patterns = ( { include = 'source.shell'; } );
        },
        {   name = 'source';
            begin = '~~~';
            end = '~~~';
            patterns = ( { include = 'source'; } );
        },

(One tricky bit to get used to is that you need to have at least one blank space between surrounding text and a ~~~ delimited block, or else the ~ characters are interpreted as strikeouts through the text.)

Here is a screenshot of this working in TextMate:

Syntax highlighting of sourcecode within the Pandoc document

Finally, just to get really meta on you here’s a screenshot of the text of this document

Text version of the document

followed by a screenshot of the HTML that Pandoc produces: HTML version of the document

followed by a screenshot of the PDF that LaTeX formatted via Pandoc: PDF version of the document

I hope this has piqued your interest in Pandoc. I love the beautiful output of LaTeX but hate working with its syntax. With Pandoc I’m free to compose in Markdown, a language with a very lightweight syntax, and then convert into TeX when and if I want to.

Categories: textmate, unix Tags: formatting, haskell, html, markdown, markup, pandoc, pdf, plaintext, textmate

JS 101 Week 5: Event handling

February 26, 2011 i82much Leave a comment

2011-02-23

Reflection

Why is it better to use either trickling or bubbling of events? Why not just give the event to the node on which it was invoked?

There are two main reasons to use the trickling down or bubbling up of events. The first is performance. If we have 100 different images on a page and we want each one to respond to the hover event, it is very inefficient to create 100 different event listeners and attach them to each element. We can instead create a single hover event listener on the <div> which contains all of these pictures, and from within that event handler, determine which of its enclosed <img> elements was clicked. Additionally, when the event bubbles up the parent chain, the programmer is free to do interesting things like change the style of entire subtrees as a result of an event listener lower down in the hierarchy.

Can you think of one situation where you would want to prevent the browser from submitting a form after a user has clicked on the ‘submit’ button? How would you achieve this?

One example would be if we had some client side validation we wanted to take place before submitting a form to the server. For instance, we might want to validate that all the form fields are filled out, or that the values entered into these fields have a certain format (e.g. mm/dd/yyyy format for dates). When the form submit button is clicked and the validation routine takes place, if there is an error that function can call the preventDefault method on the corresponding event object.

Homework

12.1 of Eloquent Javascript

Write a function asHTML which, when given a DOM node, produces a string representing the HTML text for that node and its children. You may ignore attributes, just show nodes as . The escapeHTML function from chapter 10 is available to properly escape the content of text nodes.

Hint: Recursion!

function isTextNode(node) {
  return node.nodeType == 3;
}

function asHTML(node) {
  // we’re done recursing
  if (node.childNodes.length === 0) {
    if (isTextNode(node)) {
      return node.nodeValue;
      // This is unavailable in the jsFiddle environment 
      //return escapeHTML(node.nodeValue);
    }
    else {
      return "<" + node.nodeName + ">";
    }
  }
  var returnString = "<" + node.nodeName + ">";
  for (var i = 0; i < node.childNodes.length; i++) {
    returnString += asHTML(node.childNodes[i]) + "\n";
  }
  return returnString;
}

alert(asHTML(document.body));

JSFiddle example

12.2 of Eloquent Javascript

Write the convenient function removeElement which removes the DOM node it is given as an argument from its parent node.

function removeElement(node) {
    node.parentNode.removeChild(node);
}

See an example

13.1 of Eloquent Javascript

Write a function called registerEventHandler to wrap the incompatibilities of these two models. It takes three arguments: first a DOM node that the handler should be attached to, then the name of the event type, such as “click” or “keypress”, and finally the handler function.

To determine which method should be called, look for the methods themselves ― if the DOM node has a method called attachEvent, you may assume that this is the correct method. Note that this is much preferable to directly checking whether the browser is Internet Explorer. If a new browser arrives which uses Internet Explorer’s model, or Internet Explorer suddenly switches to the standard model, the code will still work. Both are rather unlikely, of course, but doing something in a smart way never hurts.

// eventname is the name of the event without the ‘on’ prefix.  e.g.
// to register for a click event, the eventname value should be "click"
function registerEventHandler(node, eventname, handler) {
    // node has an attachEvent method; use that.  This is how ie works
    if (node.attachEvent) {
        node.attachEvent("on" + eventname, handler);
    }
    // Mozilla model
    else if (node.addEventListener) {
        // false - use bubble up rather than trickle down
        node.addEventListener(eventname, handler, false);
    }
    else {
        node["on" + eventname] = handler;
    }
}

JSFiddle example – when mouse enters element, it becomes bold. After it leaves, it becomes normal

Create an HTML page and some Javascript to allow a user to add n numbers.

First display a simple form with the question “How many numbers do you want to add (max is 10)”. The user should enter a number between 2 to 10 and click on a button in the form. You have to validate the answer. If the user has entered a correct value (between 2 and 10), then dynamically create a form with n text input fields and an “Add” button. Once the form is displayed the user will enter n numbers in the n input fields and when they click on the “Add” button, dynamically create a span element with the result. You will have to perform validation on the values entered in the input fields to make sure that they are numbers. If they are not numbers, display an alert dialogue with an error message.

JSFiddle solution

/*#p2pu-Jan2011-javascript101*/

Categories: javascript, programming Tags: dom, event handling, html, javascript, js

TextMate – Introduction to Language Grammars: How to add source code syntax highlighting embedded in HTML

February 8, 2011 i82much 7 comments

I’ve blogged about TextMate a few times in the past, and with good reason – it’s an extremely versatile, light weight, powerful text editor for the Mac. One great feature of TextMate is its extreme customizability. Today I’m going to show how to modify one of the TextMate language files in order to add support for Java code within HTML text.

Why is this useful? My workflow for producing blog posts is often to write the post in TextMate using the Markdown markup language, which I then convert to HTML. WordPress has the ability to syntax highlight and provide a nice monospaced version of sourcecode within a post if it’s delimited by <code></code> tags. While the sourcecode comes out fine in the final post, it would be nice to have the syntax highlighting show up from within the Markdown view (i.e. while I am composing a blog post). Let’s get started by looking at how language grammars work in TextMate.

Introduction to Language Grammar Editing

The language support in TextMate is extremely powerful, but it’s a little complicated to get started. In essence, a language defines a series of rules mapping patterns to scopes. For instance, the Java language grammar defines a scope for comments, a scope for control characters, and so on and so forth. The scope is extremely important for many reasons. A few of them are

The scope determines whether text is spellchecked or not (a top level scope of source is not spell checked; one that is text will be)
It provides syntax highlighting, as certain scopes are associated with certain colors.
Snippets can be targeted to only run when within a certain scope. (See this article on Scope selectors for more.) For instance, all the Java snippets are defined as only being active in the source.java scope.

An example of a Java snippet that's only accessible when the cursor is within something identified as source.java

As an aside, you might wonder why the scope is called source.java as opposed to java.scope. The reason is that some scope selectors can target the more general case (scope), whereas those concerned with java can target the more specific scope (java.scope).

Since someone has already done the hard work of creating a language definition for Java and for creating all of the snippets that support it, we want to leverage this body of work. All we need to do is ensure that text between the java tags is considered to be part of the source.java scope, and everything will just work.

First, let us look at a sample grammar file. Open up the HTML language definition file by going to Bundles -> Bundle Editor -> Edit Languages, or via the shortcut ⌃ ⌥ ⌘L, and choose the HTML option. You’ll be presented with a rather inscrutable, unstyled document to the right. The first thing you should do, and which I found out the hard way, is copy all that text and paste it into a new document.

Edit Languages

Edit HTML language

When you paste the text into the document, the text is unstyled and interpreted as plain text. In order to force TextMate to interpret this as a language grammar, you must click the item in the lower middle that says “Plain Text” and choose “Language Grammar” from the dropdown box. The document should look a lot nicer after this step:

Plain Text
After changing to Language Grammar

Take a look through the grammar, but don’t get bogged down in the details. The important thing to look at is the list of patterns defined. Here’s just a small section:

    patterns = (
        {   name = 'meta.tag.any.html';
            begin = '(]*>)';
            end = '(>()';
            beginCaptures = {
                1 = { name = 'punctuation.definition.tag.html'; };
                2 = { name = 'entity.name.tag.html'; };
            };
            endCaptures = {
                1 = { name = 'punctuation.definition.tag.html'; };
                2 = { name = 'meta.scope.between-tag-pair.html'; };
                3 = { name = 'entity.name.tag.html'; };
                4 = { name = 'punctuation.definition.tag.html'; };
            };
            patterns = ( { include = '#tag-stuff'; } );
        }

This is the first pattern that will attempt to match. You don’t need to understand all of it, but you should understand that the parentheses in the regular expressions denote capturing groups, which are then referenced in the beginCaptures and endCaptures tags. These assign scopes to the various captured groups. Note too that we can recursively include patterns (via the include = '#tag-stuff' line) which assign scope to various parts of the matched text. This allows us to define a pattern one time and reference it in multiple places, which cuts down on code duplications.

If you look through the HTML grammar, you’ll notice that some embedded code is automatically detected and set to have the matching text use the corresponding language:

ruby = {
    patterns = (
        {   name = 'comment.block.erb';
            begin = '';
            captures = { 0 = { name = 'punctuation.definition.comment.erb'; }; };
        },

Here, any times the <%# %> tag pair is seen, the entire block is captured and assigned to the scope punctuation.definition.comment.erb, which has the effect of distinguishing it from surrounding text. You can see this in action in the following screenshot:

comment.block.erb scope

In addition to the fact that the ERB snippet is syntax highlighted, take note of the popup in the screenshot showing “text.html.basic” and “comment.block.erb”. At any point in any TextMate file, you can hit ⌃ ⇧P (Control Shift P) to get the current scope of the cursor. This is extremely useful for debugging why certain elements are not being selected or assigned the scope you think they are.

Adding Java support

While using a TextMate window to edit the grammar is extremely nice, unfortunately you cannot test your changes interactively here. You must copy and paste the contents back to the original grammar window, overwriting the contents, and then press Test. This will reload the grammar and you will see the change reflected in any window using that grammar currently.

With that in mind, let’s add the support for embedding Java within our Markdown blog posts.

The basic pattern is pretty simple:

    {   name = 'source.java';
        comment = 'Use Java grammar';
        begin = '\';
        end = '\[/sourcecode\]';
        patterns = ( { include = 'source.java'; } );
    }</pre>
</div>
I look for the literal string <code></code> to start the pattern, and then the literal string <code>
 to end it.  I have to escape the brackets due to the fact that they have a special meaning within regular expressions ([aeiou] matches any vowel, while \[aeiou\] matches the literal string [aeiou]).
By adding this line to the top of the patterns, it is run before any of the others.  (Remember, we have to actually add it to the HTML grammar within the Bundle Editor, not just the TextMate window with the grammar inside of it).  Once the line is added and you press Test, the Java highlighting beings to work.
Here’s what a snippet of Java embedded in a Markdown blog post looked like without this change:

And after:

Conclusion
Language support in TextMate is a very complex task, and one that cannot be adequately covered in a single post.  I’ve shown here how to add a small snippet to the HTML grammar to allow syntax highlighting of sourcecode delimited by special blocks.  This technique could be expanded to support any number of other programming languages.
The ability to customize TextMate through editing snippets and language grammars makes it extremely powerful.  I hope this has only whetted your appetite to learn more.  If it has, please see the macromates site which has more information about this.

Categories: Java, textmate, UI Tags: blog, grammar, html, java, language, markdown, meta, punctuation, source, syntax, syntax highlight, textmate, wordpress

Developmentality

Archive

HTML/CSS tips to reduce use of JavaScript

Using BeautifulSoup to extract WordPress.com blog post metadata

Find the data

Download the raw data

Extract the metadata using `BeautifulSoup`

Pandoc – an essential tool for Markdown users

JS 101 Week 5: Event handling

Reflection

Why is it better to use either trickling or bubbling of events? Why not just give the event to the node on which it was invoked?

Can you think of one situation where you would want to prevent the browser from submitting a form after a user has clicked on the ‘submit’ button? How would you achieve this?

Homework

12.1 of Eloquent Javascript

12.2 of Eloquent Javascript

13.1 of Eloquent Javascript

Create an HTML page and some Javascript to allow a user to add n numbers.

TextMate – Introduction to Language Grammars: How to add source code syntax highlighting embedded in HTML

Introduction to Language Grammar Editing

Adding Java support

Conclusion

Top Posts

Email Subscription

Tags

Categories

Google+

Follow on twitter

Search the site

Stack Overflow profile

Nick’s tweets

Archives

Developmentality

Archive

HTML/CSS tips to reduce use of JavaScript

Using BeautifulSoup to extract WordPress.com blog post metadata

Find the data

Download the raw data

Extract the metadata using BeautifulSoup

Pandoc – an essential tool for Markdown users

JS 101 Week 5: Event handling

Reflection

Why is it better to use either trickling or bubbling of events? Why not just give the event to the node on which it was invoked?

Can you think of one situation where you would want to prevent the browser from submitting a form after a user has clicked on the ‘submit’ button? How would you achieve this?

Homework

12.1 of Eloquent Javascript

12.2 of Eloquent Javascript

13.1 of Eloquent Javascript

Create an HTML page and some Javascript to allow a user to add n numbers.

TextMate – Introduction to Language Grammars: How to add source code syntax highlighting embedded in HTML

Introduction to Language Grammar Editing

Adding Java support

Conclusion

Top Posts

Email Subscription

Tags

Categories

Google+

Follow on twitter

Search the site

Stack Overflow profile

Nick’s tweets

Archives

Extract the metadata using `BeautifulSoup`