[logon] HandOn release updates

Stephan Oepen oe at ifi.uio.no
Sun Nov 9 15:41:41 CET 2008


hei!

i just checked in two changes (into the HandOn pre-release CVS tree)
that might be relevant to some of you.

first, with emacs 22 (and the standard LOGON `dot.emacs'), i used to
have problems with /some/ UniCode characters: when entered into Lisp
through emacs (i.e. ELI), characters like |…| would be mangled.  but
not more common UniCode characters, i.e. accented characters.  i now
think i have tracked this problem to a bug in the ELI code, which i
plan to report to Franz.  before i do so, could i ask that people run
`make update' in their LOGON trees (at least those still working with
a copy checked out from CVS, i.e. the active developers), then ensure
you are running a stock LOGON configuration, and test whether you can
input all sorts of UniCode characters through emacs okay.  i would be
grateful for experience reports here.

second, [incr tsdb()] now allows hierarchical skeleton repositories,
i.e. directory trees, where sub-directories map to sub-menues in the
`File | Create' command.  for an example, please see

  lingo/lkb/src/tsdb/skeletons/english/

(tsdb :skeletons) lists this english directory structure as follows:

  Tourism Web Sites (HandOn) (`handon'): 13 skeletons:
  | HandOn Corpus from http://www.bike-norway.com (`bike'): 3154 items;
  | HandOn Corpus from DNT Activities (`dnt.activity'): 621 items;
  | HandOn Corpus from DNT Articles (`dnt.article'): 552 items;
  | HandOn Corpus from DNT Cabins (`dnt.cabin'): 891 items;
  | HandOn Corpus from DNT Overview (`dnt.index'): 443 items;
  | HandOn Corpus from DNT Areas (`dnt.location'): 620 items;
  | HandOn Corpus from DNT Trips (`dnt.trip'): 1039 items;
  | HandOn Corpus from http://www.golinfo.no (`gol'): 2863 items;
  | HandOn Corpus from Willassen Overview (`guide.general'): 324 items;
  | HandOn Corpus from Willassen Areas (`guide.location'): 1441 items;
  | HandOn Corpus from http://www.romsdalsalpene.no (`romsdal'): 466 items;
  | HandOn Corpus from http://www.tilltopps.com (`tiltopps'): 9545 items;
  | HandOn Corpus from http://www.turistveg.no (`turistveg'): 248 items;
  Tourism Brochures (LOGON) (`logon'): 20 skeletons:
  | LOGON Harmonization Test Suite (`harmony'): 62 items;
  | LOGON First Development Corpus (`hike'): 330 items;
  | LOGON Jotunheimen Corpus (All Sections) (`jh'): 6431 items;
  | LOGON Jotunheimen Corpus (Section 0) (`jh0'): 261 items;
  | LOGON Jotunheimen Corpus (Section 1) (`jh1'): 1353 items;
  | LOGON Jotunheimen Corpus (Section 2) (`jh2'): 1307 items;
  | LOGON Jotunheimen Corpus (Section 3) (`jh3'): 1443 items;
  | LOGON Jotunheimen Corpus (Section 4) (`jh4'): 1603 items;
  | LOGON Jotunheimen Corpus (Section 5) (`jh5'): 464 items;
  | LOGON Jotunheimen Test Corpus (Known-Vocabulary) (`jhk'): 250 items;
  | LOGON Development Corpus (All Segments) (`jhpstg'): 9410 items;
  | LOGON Jotunheimen Test Corpus (Unknown-Vocabulary) (`jhu'): 294 items;
  | LOGON Preikestolen Corpus (`ps'): 965 items;
  | LOGON Preikestolen Test Corpus (Known-Vocabulary) (`psk'): 45 items;
  | LOGON Preikestolen Test Corpus (Unknown-Vocabulary) (`psu'): 45 items;
  | LOGON Hiking Treebank (Rondane) (`rondane'): 1424 items;
  | LOGON Embryonic JaEn MT Test Suite (`shiken'): 18 items;
  | LOGON Turglede Corpus (`tg'): 2014 items;
  | LOGON Turglede Test Corpus (Known-Vocabulary) (`tgk'): 90 items;
  | LOGON Turglede Test Corpus (Unknown-Vocabulary) (`tgu'): 90 items;
  Penn Treebank (WSJ) (`ptb'): 26 skeletons:
  | Wall Street Journal (PTB; PARC DB) (`parc'): 700 items;
  | Wall Street Journal (PTB; Section 0) (`wsj00'): 1921 items;
  | Wall Street Journal (PTB; Section 1) (`wsj01'): 1993 items;
  | Wall Street Journal (PTB; Section 2) (`wsj02'): 1989 items;
  | Wall Street Journal (PTB; Section 3) (`wsj03'): 1480 items;
  | Wall Street Journal (PTB; Section 4) (`wsj04'): 2265 items;
  | Wall Street Journal (PTB; Section 5) (`wsj05'): 2134 items;
  | Wall Street Journal (PTB; Section 6) (`wsj06'): 1827 items;
  | Wall Street Journal (PTB; Section 7) (`wsj07'): 2163 items;
  | Wall Street Journal (PTB; Section 8) (`wsj08'): 477 items;
  | Wall Street Journal (PTB; Section 9) (`wsj09'): 2069 items;
  | Wall Street Journal (PTB; Section 10) (`wsj10'): 1942 items;
  | Wall Street Journal (PTB; Section 11) (`wsj11'): 2236 items;
  | Wall Street Journal (PTB; Section 12) (`wsj12'): 2124 items;
  | Wall Street Journal (PTB; Section 13) (`wsj13'): 2481 items;
  | Wall Street Journal (PTB; Section 14) (`wsj14'): 2182 items;
  | Wall Street Journal (PTB; Section 15) (`wsj15'): 2118 items;
  | Wall Street Journal (PTB; Section 16) (`wsj16'): 2785 items;
  | Wall Street Journal (PTB; Section 17) (`wsj17'): 1771 items;
  | Wall Street Journal (PTB; Section 18) (`wsj18'): 2262 items;
  | Wall Street Journal (PTB; Section 19) (`wsj19'): 1844 items;
  | Wall Street Journal (PTB; Section 20) (`wsj20'): 2012 items;
  | Wall Street Journal (PTB; Section 21) (`wsj21'): 1671 items;
  | Wall Street Journal (PTB; Section 22) (`wsj22'): 1700 items;
  | Wall Street Journal (PTB; Section 23) (`wsj23'): 2416 items;
  | Wall Street Journal (PTB; Section 24) (`wsj24'): 1346 items;
  Scheduling Dialogues (VerbMobil) (`verbmobil'): 11 skeletons:
  | Aged VerbMobil Data (`aged'): 96 items;
  | Balanced Blend of Corpora Extracts (`blend'): 2119 items;
  | Balanced Fuse of Corpora Extracts (`fuse'): 2363 items;
  | VerbMobil CD # 13 (`vm13'): 3408 items;
  | VerbMobil CD # 31 (`vm31'): 3914 items;
  | VerbMobil CD # 32 (`vm32'): 1034 items;
  | VerbMobil CD # 06 (`vm6'): 4037 items;
  | VerbMobil 97 (`vm97'): 100 items;
  | VerbMobil 97 (Partials) (`vm97p'): 252 items;
  | VerbMobil 98 (`vm98'): 347 items;
  Scholarly Literatur (WeScience) (`wescience'): 16 skeletons:
  | WeScience Articles 1 -- 4 (`ws01'): 795 items;
  | WeScience Articles 5 -- 15 (`ws02'): 946 items;
  | WeScience Articles 16 -- 23 (`ws03'): 916 items;
  | WeScience Articles 24 -- 28 (`ws04'): 988 items;
  | WeScience Articles 29 -- 31 (`ws05'): 908 items;
  | WeScience Articles 32 -- 37 (`ws06'): 889 items;
  | WeScience Articles 38 -- 42 (`ws07'): 805 items;
  | WeScience Articles 43 -- 48 (`ws08'): 882 items;
  | WeScience Articles 49 -- 55 (`ws09'): 935 items;
  | WeScience Articles 56 -- 63 (`ws10'): 910 items;
  | WeScience Articles 64 -- 70 (`ws11'): 744 items;
  | WeScience Articles 71 -- 78 (`ws12'): 996 items;
  | WeScience Articles 79 -- 88 (`ws13'): 891 items;
  | WeScience Articles 89 -- 91 (`ws14'): 601 items;
  | WeScience Articles 92 -- 96 (`ws14'): 601 items;
  | WeScience Articles 97 -- 100 (`ws14'): 601 items;
  ECommerce Email (YY) (`yy'): 5 skeletons:
  | DT Cell Phone Groups (Development Section) (`cell'): 14970 items;
  | Ecommerce Email (Order Cancellation) (`ecoc'): 6217 items;
  | Ecommerce Email (Order Status) (`ecos'): 7880 items;
  | Ecommerce Email (Product Availability) (`ecpa'): 8773 items;
  | Ecommerce Email (Product Return) (`ecpr'): 5939 items;
  The Cathedral and the Bazaar (`cb'): 769 items;
  CSLI (LinGO) Test Suite (`csli'): 1348 items;
  TSNLP English Test Suite (`english'): 4612 items;
  DELPH-IN MRS Test Suite (`mrs'): 107 items;
  TREC QA Questions (Ninth Conference) (`trec9'): 693 items;

i expect to put this to use for the japanese `tanaka' skeletons too.
if other languages want to re-arrange their skeleton layouts (in the
[incr tsdb()] distribution), please let me know.

--- i have been saying we should finalize this release soon, so that i
can then finally retire the CVS repository, enabling development on the
new SVN trunk.  my list of remaining things to get into the release is
relatively short by now, and i would like to stop having to propagate
manually from last-minute CVS changes to the HandOn pre-release in SVN.

could i suggest the following schedule for finalizing this transition:

  Friday, 14-nov: final changes committed to CVS;
  Monday, 17-nov: availability of SVN release candidate;
  Friday, 21-nov: SVN test reports from all developers;
  Monday, 24-nov: public release of HandOn SVN

between 17. and 21-nov i would like to ask all developers to obtain the
release candidate from SVN and test relevant functionality with `their'
grammars and language pairs.

does that schedule work for everyone, francis, eric, and berthold?

                                                   best wishes  -  oe

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
+++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++       --- oe at ifi.uio.no; oe at csli.stanford.edu; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



More information about the logon mailing list