WORKSHOPS IN STATISTICAL METHODS FOR LINGUISTIC ANALYSIS

Sponsored by the AMERICAN DIALECT SOCIETY

January 2, 1997

Chicago, Illinois

Sheraton Chicago Hotel and Towers

LINGUISTIC SOCIETY OF AMERICA ANNUAL MEETING

The American Dialect Society, to celebrate its first general meeting held

jointly with LSA, is sponsoring six workshops on the quantificational

(statistical) treatment of a variety of kinds of linguistic data. Each

workshop, conducted by an internationally-recognized authority, will be

presented twice, and participants may attend the full day's sessions,

attending as many as four different workshops.

These workshops are open to all who register for the LSA meeting and are free

of charge (except for a small fee for some workshops in which materials are

distributed).

There will be a limit on participation in these workshops. If you want to be

assured a place, please send a letter, enclosing a self-addressed stamped

post card, to

American Dialect Society

Allan Metcalf, Executive Secretary

English Department

MacMurray College

Jacksonville, Illinois 62650

or an e-mail message to him at

AAllan[AT SYMBOL GOES HERE]aol.com

For each workshop you wish to attend, please list the name of the presenter

and the time (e.g., Kretzschmar 8:00, Finegan 1:30). Do not forget the time,

since each workshop will be given twice.

The workshops were organized by Dennis Preston of Michigan State University.

Schedule:

8:00-10:00 Kretzschmar Cichocki Berdan

10:30-12:30 Bayley Labov Berdan

1:30-3:30 Bayley Finegan Cichocki

4:00-6:00 Labov Kretzschmar Finegan

THE WORKSHOPS:

1) VARBRUL analysis of linguistic variation

Robert Bayley

University of Texas, San Antonio

This session will provide a rationale for and demonstration of the VARBRUL

computer programs (Pintzuk 1988; Rand and Sankoff, 1990; Sankoff 1988). The

demonstration uses data from a study of consonant cluster reduction in

Mexican-American English (Bayley 1994) and relative pronoun choice in speech

and writing (Guy and Bayley, 1995) to show the steps in the heuristic process

of hypothesis generation, testing, and revision as it is carried out with the

help of VARBRUL, including the following: 1) generating initial hypotheses to

account for observed variation; 2) coding the data for the potentially large

number of independent factors affecting variation; 3) conducting the initial

VARBRUL run and interpreting the factor probabilities generated; 4) recoding

the data to refine hypotheses on the basis of factor probabilities generated

in step 3; 5) testing of significance of individual factors and factor groups

by means of log likelihood estimation. In addition, the workshop will

consider several questions that are likely to arise when conducting a VARBRUL

analysis, including dealing with suspected interaction among factors and

choosing between competing analyses.

2) The analysis of vowel systems

William Labov

University of Pennsylvania

This workshop will deal with the display and analysis of vowel formant data,

with particular emphasis on the study of change in progress, through use of

the Macintosh program PLOTNIK 03. Workshop participants should have a body of

formant measurements in hand, or the opportunity to acquire them, through the

use of such programs as Kay Elemetric CSL, Eric Keller's Signalyze, GSW

Soundscope, or Cornell Ornithology Lab's Canary. The workshop will show how

vowel tokens are plotted, normalized, and automatically analyzed for

segmental environment; how relevant sub-sets of vowels may be selected,

plotted or highlighted; how means and standard deviations are plotted; how to

carry out t-tests on the difference of any two means; how subsets of vowels

may be plotted or highlighted by any combination of segmental environment,

stress, or style. Particular attention will be given to methods for

determining the extent to which vowel systems participate in the Northern

Cities Shift, the Southern Shift, the Canadian Shift, or the low back merger.

Participants will receive copies of PLOTNIK 03 along with tutorial

and full documentation. PLOTNIK 03 includes several dozen features introduced

following the NWAVE 24 workshop with PLOTNIK 02, including adaptation to

other languages, shift from color to black and white, and the addition of

vectors from nuclei to glide targets. In addition, methods for superimposing

large numbers of vowel systems will be introduced through the use of the

program PLOTNIK MAJOR

3) Computer plotting and mapping of areal linguistic data

William A. Kretzschmar, Jr.

University of Georgia

This session will present a discussion of methods of computer plotting and

mapping of linguistic data drawn from American Linguistic Atlas surveys. We

will begin with the basic issues of the possible relationships between

linguistic data and geographical locations, and of the nature of GIS

(Geographical Information Systems). Computer plotting, and generalizations to

be made from observation of plots, will be illustrated with the Graphic

Plotter Grid from the Linguistic Atlas of the Gulf States, the LAMSASplot

program from the Linguistic Atlas of the Middle and South Atlantic States

(LAMSAS), and the LAMSAS Internet plotter. We will then consider use of

statistical procedures to assess geographical distribution of linguistic

features drawn from LAMSAS: t-test, chi-square, and multiple comparison for

fixed regions; spatial autocorrelation; and density estimation. Finally, we

will consider uses of GIS software to assist in visualization of

distributions.

4) Advanced multivariate analyses of linguistic data

Robert Berdan

California State University Long Beach.

This session will focus principally on logistic regression, the general

statistical approach underlying VARBRUL analyses. The generalized

application is particularly useful for data sets that are well described by

both categorical and continuous variables, a frequent situation both for

language acquisition and for historical data sets, in which time is best

considered as a continuous variable, but various linguistic and demographic

characteristics are categorical (or continuous). The SPSS implementation of

logistic regression will be demonstrated in the workshop. The workshop will

demonstrate the progression of analysis from text files to reportable

graphics and statistics. Topics considered will be the optimizing coding to

the data set, hypothesis developing and testing, evaluating competing

analyses, treatment of interactions among factors, and the interpretation of

error and reliability. We will also compare assumptions of continuous change

over time, versus discontinuities and restructuring. The SPSS graphics tools

will be explored both as analytic techniques and for reporting findings.

Where comparable, SPSS reporting will be converted to VARBRUL terms.

5) Factor analytic procedures in language analysis

Ed Finegan

University of Southern California

In its linguistic applications, the statistical technique called factor

analysis can be used to uncover patterned variation by deriving a relatively

small set of underlying variables (called 'factors') from large sets of

variable linguistic features. The workshop demonstrates the use of this

technique for identifying factors that underlie large-scale variation of

linguistic features across texts and for interpreting those factors as

linguistic constructs (usually called 'dimensions'). Also included: the

Promax rotation technique for minimizing the number of factors on which any

linguistic feature loads; appropriateness of factor analysis to different

kinds of linguistic investigations; pros and cons of factor analysis for

linguistic inquiry in general.

6) Correspondence (Dual Scaling) Analysis

Wladyslaw Cichocki

University of New Brunswick

This session demonstrates correspondence analysis (CA), a statistical

technique which is closely related to multidimensional scaling and factor

analysis. CA is particularly helpful in studying the type of categorical,

ordinal and frequency data commonly found in empirical linguistic

investigations. While CA is predominantly a data exploratory technique, it

can be used to formulate hypotheses. The presentation will avoid complicated

algebraic formulas and will emphasize instead the simple graphical displays

that are used to interpret and understand data structure. Applications will

be chosen from dialectology, phonetics, sociolinguistics and syntax.

Discussion will include issues of interpretation, stability and statistical

significance as well as a review of available computer software.