Archive
How to download your WordPress.com stats in CSV, JSON, or XML format
I wanted raw data about the popularity of my various posts on this blog to better determine what sort of topics I should post about. WordPress.com provides some nice aggregate stats, but I wanted more. After stumbling around the Internet for awhile, I cobbled together a way to download my blog data in either CSV, XML, or JSON format.
There are three steps:
- Get an API key
- Get your blog URL
- Construct the URL to download the data
Get an API key
Akismet is WordPress.com’s anti-spam solution. Register for an Akismet API key at http://akismet.com/wordpress/ by clicking on “Get an Akismet API key”.
Sign up for an account. If you choose the personal blog option, you can drag the slider all the way to the left and register for free. If you value the service that Akismet provides, you can pay more. When you complete the signup flow, you will be provided with a 12 digit ID. Copy this down.
Get your blog URL
Copy the full URL of your blog, minus the leading https://. For me this is developmentality.wordpress.com
.
Construct the URL
There is a limited API for downloading your data at the following URL:
http://stats.wordpress.com/csv.php
View this in a browser to see what the API parameters are.
Construct the url
http://stats.wordpress.com/csv.php?api_key=<api_key>&blog_uri=<blog_uri>
View this URL in the browser (or via wget
/ curl
) and you should see the view data.
There are multiple data sources. From the documentation:
table String One of views, postviews, referrers, referrers_grouped, searchterms, clicks, videoplays
Here is some sample data from each table. Change the format
param from csv
to json
or xml
to get the data in different formats.
views
CSV
"date","views"
"2013-12-31",118
JSON
[{"date":"2010-02-05","views": 46}]
XML
<views>
<day date="2014-01-01">112</day>
</views>
postviews
CSV
"date","post_id","post_title","post_permalink","views"
"2014-01-28",369876479,"Three ways of creating dictionaries in Python","https://developmentality.wordpress.com/2012/03/30/three-ways-of-creating-dictionaries-in-python/",46
JSON
[{"date":"2014-01-29","postviews":[{"post_id":369876479,"post_title":"Three ways of creating dictionaries in Python","permalink":"http:\/\/developmentality.wordpress.com\/2012\/03\/30\/three-ways-of-creating-dictionaries-in-python\/","views":22},{"post_id":369875635,"post_title":"R - Sorting a data frame by the contents of a column","permalink":"http:\/\/developmentality.wordpress.com\/2010\/02\/12\/r-sorting-a-data-frame-by-the-contents-of-a-column\/","views":16}]}]
XML
<postviews>
<day date="2014-01-30"></day>
<day date="2014-01-29">
<post id="369876479" title="Three ways of creating dictionaries in Python" url="https://developmentality.wordpress.com/2012/03/30/three-ways-of-creating-dictionaries-in-python/">54</post>
</day>
</postviews>
referrers
CSV
"date","referrer","views"
"2014-01-28","http://www.google.com/",63
JSON
[{"date":"2014-01-30","referrers":[]},{"date":"2014-01-29","referrers":[{"referrer":"http:\/\/www.google.com\/","views":66},{"referrer":"www.google.com\/search","views":27},{"referrer":"www.google.co.uk","views":10}]}]
XML
<referrers>
<day date="2014-01-30"></day>
<day date="2014-01-29">
<referrer value="http://www.google.com/" count="" limit="100">66</referrer>
</day>
</referrers>
referrers_grouped
CSV
"date","group","group_name","referrer","views"
"-","Search Engines","Search Engines","http://www.google.com/",1256
JSON
[{"date":"-","referrers_grouped":[{"referrers_grouped":"Search Engines","views":{"http:\/\/www.google.com\/":1305}}]}]
XML
<referrers_grouped>
<day date="-">
<group domain="Search Engines" name="Search Engines">
<referrer value="http://www.google.com/">1305</referrer>
</group>
</day>
</referrers_grouped>
Dates aren’t included so it’s the sum over the past N
days, defaulting to 30. To change this, set the days
URL parameter:
http://stats.wordpress.com/csv.php?api_key=<api_key>&blog_uri=<blog_uri>&table=referrers_grouped&days=<num_days>
searchterms
CSV
"date","searchterm","views"
"2014-01-28","encrypted_search_terms",190
JSON
[{"date":"2014-01-30","searchterms":[]},{"date":"2014-01-29","searchterms":[{"searchterm":"encrypted_search_terms","views":159},{"searchterm":"dynamically load property file in mule","views":2}]}]
XML
<searchterms>
<day date="2014-01-30"></day>
<day date="2014-01-29">
<searchterm value="encrypted_search_terms" count="" limit="100">159</searchterm>
<searchterm value="dynamically load property file in mule" count="" limit="100">2</searchterm>
</day>
</searchterms>
clicks
CSV
"date","click","views"
"2014-01-28","http://grab.by/grabs/b608b9c315119ca07a1f7083aabbb9c7.png",3
JSON
[{"date":"2014-01-30","clicks":[]},{"date":"2014-01-29","clicks":[{"click":"http:\/\/www.anddev.org\/extended_checkbox_list__extension_of_checkbox_text_list_tu-t5734.html","views":2},{"click":"http:\/\/android.amberfog.com\/?p=296","views":2}]}]
XML
<clicks>
<day date="2014-01-30"></day>
<day date="2014-01-29">
<click value="http://www.anddev.org/extended_checkbox_list__extension_of_checkbox_text_list_tu-t5734.html" count="" limit="100">2</click>
</day>
</clicks>
videoplays
I am not sure what this format is as I have no video plays on my blog.
Conclusion
I hope you find this useful. I’ll make another post later showing how to crunch some of this data and extract meaningful information from the raw data.
Human fallibility – static analysis tools
The theme of this post is human fallibility: no one’s perfect and we’re bound to make mistakes while coding. The following tools can help statically analyze source or markup and find mistakes early.
Python
I am a huge fan of Python as a scripting language, as its syntax and language features allow you to code quickly and with minimal fuss. (I will definitely use it as a springboard for future blog discussion, especially with respect to its differences from Java). I am used to statically compiled languages like the aforementioned Java, where the compiler will catch typos and uses of uninitialized variables; not having this ability in Python always makes me feel a bit hesitant to use it for more than quick scripts and small programming tasks. (Clearly Python is well-suited to large scale production environments; this is more my hangup and lack of expertise in the language than anything else.)
Enter PyChecker, an open-source project that aims to detect some of the most common coding mistakes, including references to variables before assignment, passing wrong numbers of arguments to functions, and calling methods that don’t exist. It won’t find all your mistakes, but if you have any long-running Python scripts, you’d much prefer to catch a typo before you start running than 90% through the computation.
JSON
JSON is an alternative to XML as a “lightweight data-interchange format”. Unlike XML with its opening and closing angle brackets, JSON has a very clean syntax with few extraneous marks. Here’s an example JSON file:
{ "number": 1, "array": [ 5, 6.7, "string", [ "nested list" ] ], "birthdayMap": { "Nick": "3/24", "Andrew": "12/1" } }
I was working on a project using JSON as its means of representing data when I ran into problems with my hand-generated JSON – I had made mistakes, omitting brackets, not closing quotation marks, or other silly mistakes. The Java library I was using to parse the JSON read in the whole file as a String, meaning all the contextual information was lost. When an error ocurred, I was left with a cryptic error message like “syntax error, unexpected $end, expecting ‘}’ at character 2752” with no idea where in the file the error lay.
Thanks to my coworker Dave, I found the excellent tool JSONLint which not only highlights the exact line and location of your syntax error, it also reformats your JSON to a consistent amount of indentation for each nested level. JSONLint is indispensible if you’re fat-fingering your JSON code.
Java
PMD is a plugin for NetBeans, Eclipse, and a host of other Java IDEs and standard text editors that warns you of bad coding practices in your source code, as well as alerting you to potential errors. There are a variety of rules specified in either XQuery notation or Java code that can be turned on and off at will; for instance if you are doing a lot of coding with interfaces to C and need to use shorts and bytes, you probably won’t want the “Don’t use shorts” warning popping up on every line you use a short in.
Some of the most useful rules I’ve found are the fall through in switch statements,
Not all of the rules are cut-and-dried, which the website acknowledges with the addition of a Controversial Rules section. Some might be due to stylistic differences or just disagreements over whether or not certain constructs are bad practice. For instance,
OnlyOneReturn
Since: PMD 1.0
A method should have only one exit point, and that should be the last statement in the method.
This rule is defined by the following Java class: net.sourceforge.pmd.rules.design.OnlyOneReturnRule
Example:
public class OneReturnOnly1 { public void foo(int x) { if (x > 0) { return "hey"; // oops, multiple exit points! } return "hi"; } }
Doing a search for “java multiple exit points” reveals that there is definitely a lot of discussion as to best practices in this regard, hence its inclusion in the Controversial Rules section.
Hopefully you’ve gained at least one new tool from this post; if not I’ll be posting more in days to come.