Exploratory Parsing

Recent advances in parsing combine context and backtracking into a single language. We use both attributes to explore semi-structured texts by supporting parser generation with experiment management and continuous visualization of partial results.


We describe various server-resident datasets. For each we provide the record framing and information coding conventions as we know them and suggest approaches to extracting additional features.


3,500,000 Articles from English Wikipedia dump. ★

47,200,000 Surveys for wikipedia article quality.


18,000,000 AboutUs domain pages.

30,000 Web Pages from .com zone file scrape. ★

3,000 Thick whois records. private


19,000 Sentences by Dickens.

43,000 Batch job steps. private


