Glossary
Introduction
Instructional Practices | Student Assessment
Context and Program Features Assessment
Evaluation/Research Methods/Designs
Communities of Learning, Inquiry and Practice
Organizational Planning/Development | Cultural Competence | Resources
This glossary defines terms frequently used in the following research areas: Instructional Practices, Assessment, Evaluation, Research Design, CLIPs, Organizational Planning/ Development, and Cultural Competence. The terms are organized alphabetically within each of these areas.
The terms were taken directly from the sources listed at the end of this document. Additional terms will be added as time goes along.
- Hidden curriculum
- The knowledge, values, and behaviors that are taught tacitly by the way
in which schools are structured and classroom instruction is organized.
A-C | D-G | H-K |
L-O | P-R | S-T |
U-Z
--A-C--
- Anchor(s)
- An "anchor" is the specific product or performance used for setting the
standards in an assessment. The anchors are the representative performances
used to "anchor" each place on the scoring scale. (The top "anchor: is
sometimes called the 'exemplar'). Without the "anchors: our assessment would
be too relative or norm-referenced: the "best" result would simply be the
best of what we received in the assessment; that "best" might still be
mediocre. The anchors thus set the standard. They also make the test
criterion-referenced: we would no longer expect scores to be distributed
along a normal curve. We might get very few products or performances—or even
none at all—the match the quality of the top anchor.
- Authentic assessment
- An "authentic" assessment is composed of worthy tasks—challenges which
we want students to master. Authentic assessment thus teaches the students
(and teachers) what demonstrated uses of subject matter are considered most
important. The test tasks are chosen because they are representative or
simulated versions of essential questions or challenges facing practitioners
in the field.
An "authentic" test thus directly measures students on the performances we
value. Multiple-choice tests are by definition, indirect. They are "proxy"
forms of measurement (thought perhaps valid; see below). By calling a test
or the tasks which compose it "inauthentic" the speaker is suggesting that
the "items" are simplistic and overly-indirect forms of testing.
In sum, an authentic assessment should: 1) engage the student in challenges
that represent the ‘tests” likely to face them as professionals, citizens,
or consumers; 2) be composed of tasks that look like the best kinds of
instructional activities: oral histories, science labs, computer
simulations, debates, research projects, etc.
- Benchmark
- The specific performance or achievement, chosen by goal setters, that
sets the standard for performance at the challenge in question. (See
Standard).
- Bloom's taxonomy
- Some forty years ago Benjamin Bloom and his colleagues developed a
schema for distinguishing the simplest forms of recall from the most
sophisticated uses of knowledge in designing student assessments. The six
elements were called Knowledge, Comprehension, Application, Analysis,
Synthesis, and Evaluation.
Note, too, that Bloom and his colleagues explicitly warned against thinking
of the taxonomy as a sequential schema for teaching and testing. The
taxonomy does not imply, in other words, that the most appropriate way to
teach and test is to move “up” the cognitive ladders, step-by-step. We know
that higher-order tasks often do not get taught or tested because of this
mistaken view. Nor, therefore, should we assume that students are incapable
of or not ready for higher-order tasks if they do poorly on tests of
lower-level facts and comprehension.
- Computer-adaptive testing
- An approach to measurement in which the difficulty level of test items
presented to a particular test-taker is matched by the computer to the test
taker’s ability level as judged from performance on earlier test items.
- Criteria
- To ask "What are the criteria to be used in judging student work?"
amounts to asking "Where should we look in examining this product or
performance? What aspects of performance are most important? For what kinds
of errors will we then take points off, and to what degree?" We must also
determine how much we should weight each criterion relative to other
criteria in making our judgment; yes, spelling and development of ideas are
both important in judging writing; what percent should we assign to each?
- Criterion-referenced measurement
- An approach to testing in which an individual’s score on a test is
interpreted by comparing it to a pre-specified standard of performance.
--D-G--
- Derived score
- A transformation of a raw score (e.g., age equivalents) to reveal the
individual’s performance relative to a norming group.
- Diagnostic test
- A type of measure that is used to identify a student’s strengths and
weaknesses in a particular school subject.
- Domain-referenced measurement
- A type of criterion-referenced measurement that assesses how well an
individual performs on a sample of items that represents a well-defined
content area.
- Face validity
- The extent to which a casual, subjective inspection of a test’s items
indicates that they cover the content that the test is claimed to measure.
--H-K--
--L-O--
- Measurement error
- In classical test theory, the difference between an individual’s true
score on a test and the scores that the individual actually obtains on it
when it is administered over a variety of conditions.
- Norming group
- A large sample (ideally one that is representative of a well-defined
population) whose scores on a test provide a set of standards against which
the scores of subsequent individuals who take the test can be referenced.
- Norm-referenced measurement
- An approach to testing in which an individual’s score on a test is
interpreted by comparing it to the scores earned by a norming group.
--P-R--
- Performance assessment
- To perform is to "act upon and bring to completion." To perform in the
intellectual realm involves using one’s knowledge to effectively act or
bring to fruition a complex product in which one’s knowledge and expertise
is revealed. Music recitals and auto mechanic competitions are performances
in both senses; so are oral exams.
A performance assessment thus differs from a conventional paper and pencil
test in the same way that the driving test for one’s license differs from
the written test. In the former case, the test is meant to realistically
simulate driving ‘performance’—to replicate some typical "tests" that arise
in daily driving. In the latter case, we test for knowledge of driving facts
and rules, not whether the student knows how to employ them in "performing"
the act of driving.
- Portfolio (folio)
- In performance assessment, a purposeful collection of a student’s work
that records the student’s progress in mastering a subject domain (e.g.,
writing in multiple genres) along with the student’s personal reflections on
his or her progress.
A representative and judicious collection of one’s work. As the word’s roots
suggest (and as is still the case in the arts), the collection is carried
from place to place for inspection or exhibition, usually as a kind of
résumé.
- Process
- In the context of assessment "process" refers to the intermediate steps
the student takes in reaching the final performance or end-product specified
in the assessment. "Process" thus includes all strategies, decisions,
sub-skills, rough drafts, and rehearsals used in completing the given task.
- Product
- A product is the tangible and stable residue of a performance and the
processes that led to it. The product is valid for assessing the student’s
knowledge to the extent that success or failure in producing the product a)
is dependent upon the knowledge we taught and want to assess, and b)
appropriately "samples" from the whole curriculum in a way that mirrors the
relative importance of the material in the course (as reflected in the test
blueprint).
- Raw Score
- An individual score on a measure as determined by the scoring key,
without any further statistical manipulation.
- Reliable reliability
- Reliability in testing refers to the likelihood that the score or grade
would be constant if the test were re-taken or the same performance were
re-scored by someone else. Error is unavoidable; all tests, including the
best multiple-choice tests, lack 100% reliability. The aim is to minimize it
to tolerable levels.
- Rubric
- A rubric is a set of scoring guidelines for giving scores to student
work. (The word derives from the Latin word for "red" and was once used to
signify the directions for conducting religious services, found in the
margins of liturgical books—and written in red). The rubric answers the
question "What does mastery (and varying degrees of mastery) at this task
look like?"
In performance assessment, a scale for measuring different levels of
proficiency demonstrated in students’ portfolios
A typical rubric: 1) contains a scale of different possible points to be
assigned, often ranging from 4 or 6 as the top score, down to 1 or 0 for the
lowest scores in performance assessment; 2) states all the different traits
or dimensions to be primarily examined and assessed (e.g. "syntax" or
"understanding of scientific method"); and 3) provides key signs or salient
traits of performance or product for finding the right place on the scoring
scale to which a particular student result corresponds. Note, therefore,
that a rubric signifies that the assessment is ‘criterion-referenced,"
implying that scores should not necessarily be distributed along the
"normal" curve.
--S-T--
- Scale
- The demarcated continuum (number-line) for scoring performance; the
range of numbers (or letters) within which we score work. Performance
assessment typically uses a much smaller scale for scoring than standardized
tests. Rather than a scale of 100 or more, most performance-based assessment
uses a 6-point scale; rarely does a scale contain more than 10 points.
There are two inter-related reasons for this. Each place on the scale is not
arbitrary (as it is in norm-referenced scoring); it is meant to correspond
to a specific criterion or quality of work. The second reason is practical:
to use a scale of so many discrete numbers makes reliability unlikely, and
attempts at such fine criterion-referenced distinctions become picky or
arbitrary.
- Standard(s)
- To ask “What is the standard on this assessment?” is to wonder about how
well the student must perform to do well or adequately. But confusions
abound; the word is sometimes used in education as a synonym for "high
expectations" and at other times as a synonym for "benchmark"—the best the
performance or product can be done or has been done. And this second view
involves a further ambiguity: do we mean "the best performance by students
of that age and experience?" Or the "best performance we have seen at any
level?"
In any case, the standard is not the criteria. The criteria for the high
jump or the persuasive essay are more or less fixed—no matter the age or
ability of the student. All high jumps that are successful must ensure that
the bar stays on; all persuasive essays should use lots of appropriate
evidence and argument effectively. But how high should the bar be? How
sophisticated and fluent should the essay be? That is the standards"
question. And clearly, it depends upon the purpose of the assessment.
- Standardized test
- A test for which procedures have been developed to ensure consistency in
administration and scoring across all testing situations.
- Task
- A task is a complex assessment activity. (The British use the phrase
"integrated task" to capture this idea.) It demands that we bring to bear a
repertoire of knowledge and skill to solve a multi-faceted problem or
question through a series of judgments and actions. Most tasks are goal
directed: they are "done" when we have successfully fashioned a performance
or product to specifications. A task thus differs from a conventional test
item in the same way that "successfully building a balsa bridge to withstand
y points per square inch" differs from solving physics textbook problems.
- Test
- A structured performance situation that can be analyzed to yield
numerical scores, from which inferences and made about how individuals
differ in the construct measured by the test.
- Test norms
- For a particular test, the scores of a large group, typically converted
to percentiles or another type of derived score, to which the scores of
subsequent test-takers are compared.
- Test reliability
- The extent to which there is measurement error present in the scores
yielded by a test.
- Test-retest reliability
- An approach to estimating test reliability in which individuals’ scores
on a test administered at one point in time are correlated with their scores
on the same test administered at another point in time.
--U-Z--
- Validity
- A test is valid if it is an apt instrument for making inferences about a
student’s ability. Does the test design correspond to the “blueprint” of the
course syllabus? Do the results correspond to what these students would
likely do if they were confronted with a variety of "authentic" tasks
requiring such knowledge? Does the small sample of questions accurately
correlate with what students would do if we tested them on everything that
was taught in the course? Do the results have predictive value, i.e. do they
correlate with likely future success in the subject in question? If the
answers are "yes" then the test is valid.
A-C | D-G | H-K |
L-O | P-R | S-T |
U-Z
--A-C--
- Confirmation survey interview
- In qualitative research, a type of interview that is used to confirm the
findings obtained from data that were collected by other methods.
- Content analysis
- The study of particular aspects of the information contained in a
document, film, or other form of communication.
- Culture
- The sum total of ways of living (e.g., values, customs, rituals, and
beliefs) that are built up by a group of human beings and that are
transmitted from one generation to another or from current members to newly
admitted members.
--D-G--
- Descriptive observational variable (or low-inference variable)
- A variable that requires little inference on the part of an observer to
determine its presence or level.
- Focus group interview
- A type of interview involving an interviewer and a group of research
participants, who are free to talk with and influence each other in the
process of sharing their ideas and perceptions about a defined topic.
- General interview guide approach
- A type of interview in which a set of topics is planned, but the order
in which the topics are covered and the working of questions is decided as
the interview proceeds.
--H-K--
- Interview schedule
- A measure that specifies the questions to be asked of each research
participant, the sequence in which they are to be asked, and guidelines for
what the interviewer is to say at the opening and closing of the interview.
--L-O--
- Likert scale
- A measure that asks individuals to check their level of agreement with
various statements about an attitude object (e.g., strongly agree, agree,
undecided, disagree, or strongly disagree).
- Oral history
- The use of oral interviews of individuals who witnessed or participated
in particular events as sources of data about the past; also, the use of
ballads, tales, and other forms of spoken language as sources of data about
the past.
--P-R--
- Questionnaire
- A measure that presents a set of written questions to which all
individuals in a sample respond.
--S-T--
- Structured interview
- A type of interview in which the interviewer asks a series of
closed-form questions that either have yes-no answers or can be answered by
selecting from among a set of short-answer choices.
--U-Z--
- Unstructured interview
- A type of interview in which the interviewer does not use a detailed
interview guide, but instead asks situationally determined questions that
gradually lead respondents to give the desired information.
A-C | D-G |
H-K | L-O | P-R |
S-T |
U-Z
--A-C--
- A-B design
- A type of single-case experiment in which the researcher institutes a
baseline condition (A), followed by the treatment (B). The target behavior
is measured repeatedly during both conditions.
- A-B-A design
- A type of single-case experiment in which the researcher institutes a
baseline condition (A), administers the treatment (B), and institutes a
second baseline condition (A). The target behavior is measured repeatedly
during all three conditions.
- A-B-A designs
- Any single-case experiment that has at least one baseline condition
(designated A) and one treatment condition (designated B).
- A-B-A-B design
- A type of single-case experiment in which the researcher institutes a
baseline condition (A), administers the treatment (B), institutes a second
baseline condition (A), and then re-administers the treatment (B). The
target behavior is measured repeatedly during all four conditions.
- Accessible population
- All the members of a set of people, events, or objects who feasibly can
be included in the researcher’s sample.
- Acquiescence bias
- In testing, a type of response set in which individuals agree with items
irrespective of their content.
- Alpha level
- The level of statistical significance that is selected prior to data
collection for rejecting a null hypothesis.
- Analysis of covariance (ANCOVA)
- A procedure for determining whether the difference between the mean
scores of two or more groups on one or more dependent variables is
statistically significant, after controlling for initial differences between
the groups on one or more extraneous variables. When the groups have been
classified on several independent variables (called factors), the procedure
can be used to determine whether each factor and the interactions between
the factors have a statistically significant effect on the dependent
variable, after controlling for the extraneous variable.
- Analysis of variance (ANOVA)
- A procedure for determining whether the difference between the mean
scores of two or more groups on a dependent variable is statistically
significant. When the groups have been classified on several independent
variables (called factors), the procedure can be used to determine whether
each factor and the interactions between the factors have a statistically
significant effect on the dependent variable.
- Analytic induction
- In qualitative research, the process of inferring themes and patterns
from an examination of data.
- Audit trail
- In a literature review, an account of all the procedures and decision
rules that were used by the reviewer. In qualitative research, the process
of documenting the materials and procedures used in each phase of a study.
- Bias
- A set to perceive events or other phenomena in such a way that certain
facts are habitually overlooked, distorted, or falsified.
- Case study research
- The in-depth study of instances of a phenomenon in its natural context
and from the perspective of the participants involved in the phenomenon.
- Chi-square(x2) test
- A nonparametric test of statistical significance that is used when the
research data are in the form of frequency counts for two or more
categories.
- CIPP model
- A type of evaluation that is designed to support the decision-making
process in program management.
CIPP is an acronym for the four types of educational evaluation included in
the model. Context evaluation, Input evaluation Process evaluation, and
Product evaluation.
- Cohort longitudinal research
- A type of investigation in which changes in a population over time are
studied by selecting a different sample at each data collection point from a
population that remains constant.
- Collinearity
- The degree of correlation between any two variables that are to be used
as predictors in a multiple regression analysis.
- Compensatory rivalry (or John Henry effect)
- In experiments, a situation in which control group participants perform
beyond their usual level because they perceive that they are in competition
with the experimental group.
- Control group
- In an experiment, a group of research participants who receive no
treatment or an alternate treatment so that the effect of extraneous
variables can be determined.
- Convenience sample
- A group of cases that are selected simply because they are available and
easy to access.
- Correlation research
- A type of investigation that seeks to discover the direction and
magnitude of the relationship among variables through the use of
correlational statistics.
- Cross-sectional longitudinal research
- A type of investigation in which changes in a population over time are
studied by collecting data at one point in time, but from samples that vary
in age or developmental stage.
--D-G--
- Dependent variable
- A variable that the researcher thinks occurred after, and as a result
of, another variable (called the independent variable). In a hypothesized
cause-and-effect relationship, the dependent variable is the effect.
- Descriptive research
- In quantitative research, a type of investigation that measures the
characteristics of a sample or population on pre-specified variables. In
qualitative research, a type of investigation that involves providing a
detailed portrayal of one or more cases.
- Educational evaluation
- The process of making judgments about the merit, value, or worth of an
educational program, method, or other phenomenon.
- Effect size
- An estimate of the magnitude of a difference, a relationship, or other
effect in the population represented by a sample.
- Expertise-based evaluation
- The use of experts to make judgments about the worth of an educational
program.
- External validity
- The extent to which the results of a research study can be generalized
to individuals and situations beyond those involved in the study.
- Extraneous variable
- In experiments, any aspect of the situation, other than the treatment
variable, that can influence the dependent variable and that, if not
controlled, can make it impossible to determine whether the treatment
variable is responsible for any observed effect on the dependent variable.
- Factor analysis
- A statistical procedure for reducing a set of measured variables to a
smaller number of variables (called factors or latent variables) by
combining variables that are moderately or highly correlated with each
other.
- Formative evaluation
- A type of evaluation that is done while a program is under development
in order to improve its effectiveness, or to support a decision to abort
further development so that resources are not wasted.
- Goal-free evaluation
- A type of evaluation research in which the evaluator investigates the
actual effects of a program without being influenced by prior knowledge of
the program’s stated goals.
--H-K--
- Halo effect
- The tendency for the observer’s early impressions of an individual being
observed to influence the observer’s ratings of all variables involving the
same individual.
- Hawthorne effect
- An observed change in research participants’ behavior based on their
awareness of participating in an experiment, their knowledge of the
researcher’s hypothesis, or their response to receiving special attention.
- Historical research
- The study of past phenomena for the purpose of gaining a better
understanding of present institutions, practices, trends, and issues.
- Hypothesis
- The researcher’s prediction, derived from a theory or from speculation,
about how two or more measured variables will be related to each other.
- Independent variable
- A variable that the researcher thinks occurred prior in time to, and had
an influence on, another variable (called the dependent variable). In a
hypothesized cause-and-effect relationship, the independent variable is the
cause.
- Institutional review board (IRB)
- A committee that is established by an institution to ensure that
participants in any research proposed by individuals affiliated with the
institution will be protected from harm.
--L-O--
- Main effect
- In experiments, the influence of a treatment variable by itself (i.e.,
not an interaction with any other variable) on a dependent variable.
- Matching
- A procedure that equates two or more groups on the extraneous variable Z
at the outset of a study so that it can be ruled out as an influence on any
relationship between X and Y that is later observed.
- Mean
- A measure of central tendency calculated by dividing the sum of the
scores in a set by a number of scores.
- Median
- A measure
of central tendency corresponding to the middle point in a distribution of
scores.
- Mode
- A measure of central tendency corresponding to the most frequently
occurring score in a distribution of scores.
- Multiple regression
- A statistical procedure for determining the magnitude of the
relationship between a criterion variable and a combination of two or more
predictor variables.
- Normal curve (or normal probability curve)
- A distribution of scores that form a symmetrical, bell-shaped curve when
plotted on a graph.
- Null hypothesis
- A prediction that no relationship between two measured variables will be
found, or that no difference between groups on a measured variable will be
found.
- One-group pretest-posttest design
- A type of experiment in which all participants are exposed to the same
conditions: measurement of the dependent variable (pretest), implementation
of the experimental treatment, and another measurement of the dependent
variable (posttest).
- One-shot case study design
- A type of experiment in which an experimental treatment is administered
and then a posttest is administered to measure the effects of the treatment.
--P-R--
- Path analysis
- A statistical method for testing the validity of a theory about causal
links between three or more measured variables.
- Pattern
- In case study research, an inference that particular phenomena within a
case or across cases are systematically related to each other. See also
relational pattern and causal pattern.
- Pilot study
- A small-scale, preliminary investigation that is conducted to develop
and test the measures or procedures that will be used in a research study.
- Positivism
- The epistemological doctrine that physical and social reality is
independent of those who observe it, and that observations of this reality,
if unbiased, constitute scientific knowledge.
- Postmodernism
- A broad social and philosophical movement that questions the rationality
of human action, the use of positivist epistemology, and any human endeavor
(e.g., science) that claims a privileged position with respect to the search
for truth or that claims progress in its search for truth.
- Postpositivism
- The epistemological doctrine that social reality is a construction, and
that it is constructed differently by different individuals.
- Poststructuralism
- The study of phenomena as systems, with the assumption that these
systems have no inherent meaning.
- Posttest
- A measure
that is administered following an experimental or control treatment or other
intervention in order to determine the effects of the intervention.
- Posttest-only control-group design
- A type of experiment that includes three phases: random assignment of
research participants to the experimental and control groups; administration
of the treatment to the experimental group and either no treatment or an
alternative treatment to the control group, and administration of a measure
of the dependent variable to both groups.
- Pretest
- A measure that is administered prior to an experimental treatment or
other intervention.
- Primary source
- A document (e.g., a journal article or a book) written by an individual
who actually conducted the research study, developed the theory, witnessed
the events, or formulated the opinions described in the document.
- Qualitative research (or postpositivist research)
- Inquiry that is grounded in the assumption that individuals construct
social reality in the form of meanings and interpretations, and that these
constructions tend to be transitory and situational. The dominant
methodology is to discover these meanings and interpretations by studying
cases intensively in natural settings and by subjecting the resulting data
to analytic induction.
- Quantitative research (or positivist research)
- Inquiry that is grounded in the assumption that features of the social
environment constitute an objective reality that is relatively constant
across time and settings. The dominant methodology is to describe and
explain features of this reality by collecting numerical data on observable
behaviors of samples and by subjecting these data to statistical analysis.
- Quasi-experiment
- A type of experiment in which research participants are not randomly
assigned to the experimental and control group.
- Random assignment
- The process of assigning individuals or groups (e.g., classrooms) to the
experimental and control treatments such that each individual or group has
an equal chance of being in each treatment.
- Range
- A measure of the amount of dispersion in a distribution of scores; it is
expressed as the lowest and highest scores in the distribution.
- Reliability
- In case study research, the extent to which other researchers would
arrive at similar results if they studied the same case using exactly the
same procedures as the first researcher. In classical test theory, the
amount of measurement error in the scores yielded by a test.
- Representative design
- The planning of experiments so that they reflect accurately both the
real-life environments in which the phenomena being studied occur and the
research participants’ natural behavior and cognitive processes in those
environments.
--S-T--
- Sampling
- The process of selecting members of a research sample from a defined
population, usually with the intent that the sample accurately represent
that population.
- Sampling error
- The deviation of a sample statistic from its population value.
- Sampling frame
- A list of all members of the population from which a sample will be
drawn.
- Scattergram (or scatterplot)
- A graph of the correlation between two variables, such that the scores
of individuals on one variable are plotted on the x axis of the graph and
the scores of the same individuals on another variable are plotted on the y
axis.
- Stanine
- A type of standard score with a distribution that has a mean of 5 and a
standard deviation of 2; the scores are continuous and have equality of
units.
- Statistic
- Any number that describes a characteristic of a sample’s scores on a
measure.
- Statistical power
- The probability that a particular test of statistical significance will
lead to the rejection of a false null hypothesis.
- Summative evaluation
- A type of evaluation that is conducted to determine the worth of a fully
developed program, especially in comparison with competing programs.
- Survey research
- The use of questionnaires or interviews to collect data about the
characteristics, experiences, knowledge, or opinions of a sample or a
population.
- t test
- A test of statistical significance that is used to determine whether the
null hypothesis that two sample means come from identical populations can be
rejected.
- Tacit knowledge
- Implicit meanings that the individuals being studied either cannot find
the words to express or that they take so much for granted that they do not
explicate them either in everyday discourse or in research interviews.
--U-Z--
- Unobtrusive measure (or nonreactive measure)
- A procedure for measuring variables by using data that are found
naturally in a field setting and that can be collected without field
participants’ awareness.
- Validity
- In testing, the appropriateness, meaningfulness, and usefulness of
specific inferences made from test scorers.
- Variability
- The amount of dispersion in a distribution of scores; the greater the
variability of a set of scores, the more they deviate from their mean.
Terms will be added as we go along.
Terms will be added as we go along.
Terms will be added as we go along.
Books
The terms above were taken directly from the following two sources:
- Gall, M., Borg, W., & Gall, J. (1996). Educational research: An
introduction. Sixth Edition. White Plains, NY: Longman.
- Wiggins, G., Browne, J., & Houston, H. (1991). Standards, not
standardization. Stow, MA: Greater Insights.