Glossary

Introduction

Instructional Practices | Student Assessment

Context and Program Features Assessment

Evaluation/Research Methods/Designs

Communities of Learning, Inquiry and Practice

Organizational Planning/Development | Cultural Competence | Resources

Introduction

This glossary defines terms frequently used in the following research areas: Instructional Practices, Assessment, Evaluation, Research Design, CLIPs, Organizational Planning/ Development, and Cultural Competence. The terms are organized alphabetically within each of these areas.

The terms were taken directly from the sources listed at the end of this document. Additional terms will be added as time goes along.

Instructional Practices

Hidden curriculum
The knowledge, values, and behaviors that are taught tacitly by the way in which schools are structured and classroom instruction is organized.

Student Assessment

A-C | D-G | H-K | L-O | P-R | S-T | U-Z

--A-C--

Anchor(s)
An "anchor" is the specific product or performance used for setting the standards in an assessment. The anchors are the representative performances used to "anchor" each place on the scoring scale. (The top "anchor: is sometimes called the 'exemplar'). Without the "anchors: our assessment would be too relative or norm-referenced: the "best" result would simply be the best of what we received in the assessment; that "best" might still be mediocre. The anchors thus set the standard. They also make the test criterion-referenced: we would no longer expect scores to be distributed along a normal curve. We might get very few products or performances—or even none at all—the match the quality of the top anchor.
Authentic assessment
An "authentic" assessment is composed of worthy tasks—challenges which we want students to master. Authentic assessment thus teaches the students (and teachers) what demonstrated uses of subject matter are considered most important. The test tasks are chosen because they are representative or simulated versions of essential questions or challenges facing practitioners in the field.
An "authentic" test thus directly measures students on the performances we value. Multiple-choice tests are by definition, indirect. They are "proxy" forms of measurement (thought perhaps valid; see below). By calling a test or the tasks which compose it "inauthentic" the speaker is suggesting that the "items" are simplistic and overly-indirect forms of testing.
In sum, an authentic assessment should: 1) engage the student in challenges that represent the ‘tests” likely to face them as professionals, citizens, or consumers; 2) be composed of tasks that look like the best kinds of instructional activities: oral histories, science labs, computer simulations, debates, research projects, etc.
Benchmark
The specific performance or achievement, chosen by goal setters, that sets the standard for performance at the challenge in question. (See Standard).
Bloom's taxonomy
Some forty years ago Benjamin Bloom and his colleagues developed a schema for distinguishing the simplest forms of recall from the most sophisticated uses of knowledge in designing student assessments. The six elements were called Knowledge, Comprehension, Application, Analysis, Synthesis, and Evaluation.
Note, too, that Bloom and his colleagues explicitly warned against thinking of the taxonomy as a sequential schema for teaching and testing. The taxonomy does not imply, in other words, that the most appropriate way to teach and test is to move “up” the cognitive ladders, step-by-step. We know that higher-order tasks often do not get taught or tested because of this mistaken view. Nor, therefore, should we assume that students are incapable of or not ready for higher-order tasks if they do poorly on tests of lower-level facts and comprehension.
Computer-adaptive testing
An approach to measurement in which the difficulty level of test items presented to a particular test-taker is matched by the computer to the test taker’s ability level as judged from performance on earlier test items.
Criteria
To ask "What are the criteria to be used in judging student work?" amounts to asking "Where should we look in examining this product or performance? What aspects of performance are most important? For what kinds of errors will we then take points off, and to what degree?" We must also determine how much we should weight each criterion relative to other criteria in making our judgment; yes, spelling and development of ideas are both important in judging writing; what percent should we assign to each?
Criterion-referenced measurement
An approach to testing in which an individual’s score on a test is interpreted by comparing it to a pre-specified standard of performance.

--D-G--

Derived score
A transformation of a raw score (e.g., age equivalents) to reveal the individual’s performance relative to a norming group.
Diagnostic test
A type of measure that is used to identify a student’s strengths and weaknesses in a particular school subject.
Domain-referenced measurement
A type of criterion-referenced measurement that assesses how well an individual performs on a sample of items that represents a well-defined content area.
Face validity
The extent to which a casual, subjective inspection of a test’s items indicates that they cover the content that the test is claimed to measure.

--H-K--

--L-O--

Measurement error
In classical test theory, the difference between an individual’s true score on a test and the scores that the individual actually obtains on it when it is administered over a variety of conditions.
Norming group
A large sample (ideally one that is representative of a well-defined population) whose scores on a test provide a set of standards against which the scores of subsequent individuals who take the test can be referenced.
Norm-referenced measurement
An approach to testing in which an individual’s score on a test is interpreted by comparing it to the scores earned by a norming group.

--P-R--

Performance assessment
To perform is to "act upon and bring to completion." To perform in the intellectual realm involves using one’s knowledge to effectively act or bring to fruition a complex product in which one’s knowledge and expertise is revealed. Music recitals and auto mechanic competitions are performances in both senses; so are oral exams.
A performance assessment thus differs from a conventional paper and pencil test in the same way that the driving test for one’s license differs from the written test. In the former case, the test is meant to realistically simulate driving ‘performance’—to replicate some typical "tests" that arise in daily driving. In the latter case, we test for knowledge of driving facts and rules, not whether the student knows how to employ them in "performing" the act of driving.
Portfolio (folio)
In performance assessment, a purposeful collection of a student’s work that records the student’s progress in mastering a subject domain (e.g., writing in multiple genres) along with the student’s personal reflections on his or her progress.
A representative and judicious collection of one’s work. As the word’s roots suggest (and as is still the case in the arts), the collection is carried from place to place for inspection or exhibition, usually as a kind of résumé.
Process
In the context of assessment "process" refers to the intermediate steps the student takes in reaching the final performance or end-product specified in the assessment. "Process" thus includes all strategies, decisions, sub-skills, rough drafts, and rehearsals used in completing the given task.
Product
A product is the tangible and stable residue of a performance and the processes that led to it. The product is valid for assessing the student’s knowledge to the extent that success or failure in producing the product a) is dependent upon the knowledge we taught and want to assess, and b) appropriately "samples" from the whole curriculum in a way that mirrors the relative importance of the material in the course (as reflected in the test blueprint).
Raw Score
An individual score on a measure as determined by the scoring key, without any further statistical manipulation.
Reliable reliability
Reliability in testing refers to the likelihood that the score or grade would be constant if the test were re-taken or the same performance were re-scored by someone else. Error is unavoidable; all tests, including the best multiple-choice tests, lack 100% reliability. The aim is to minimize it to tolerable levels.
Rubric
A rubric is a set of scoring guidelines for giving scores to student work. (The word derives from the Latin word for "red" and was once used to signify the directions for conducting religious services, found in the margins of liturgical books—and written in red). The rubric answers the question "What does mastery (and varying degrees of mastery) at this task look like?"
In performance assessment, a scale for measuring different levels of proficiency demonstrated in students’ portfolios
A typical rubric: 1) contains a scale of different possible points to be assigned, often ranging from 4 or 6 as the top score, down to 1 or 0 for the lowest scores in performance assessment; 2) states all the different traits or dimensions to be primarily examined and assessed (e.g. "syntax" or "understanding of scientific method"); and 3) provides key signs or salient traits of performance or product for finding the right place on the scoring scale to which a particular student result corresponds. Note, therefore, that a rubric signifies that the assessment is ‘criterion-referenced," implying that scores should not necessarily be distributed along the "normal" curve.

--S-T--

Scale
The demarcated continuum (number-line) for scoring performance; the range of numbers (or letters) within which we score work. Performance assessment typically uses a much smaller scale for scoring than standardized tests. Rather than a scale of 100 or more, most performance-based assessment uses a 6-point scale; rarely does a scale contain more than 10 points.
There are two inter-related reasons for this. Each place on the scale is not arbitrary (as it is in norm-referenced scoring); it is meant to correspond to a specific criterion or quality of work. The second reason is practical: to use a scale of so many discrete numbers makes reliability unlikely, and attempts at such fine criterion-referenced distinctions become picky or arbitrary.
Standard(s)
To ask “What is the standard on this assessment?” is to wonder about how well the student must perform to do well or adequately. But confusions abound; the word is sometimes used in education as a synonym for "high expectations" and at other times as a synonym for "benchmark"—the best the performance or product can be done or has been done. And this second view involves a further ambiguity: do we mean "the best performance by students of that age and experience?" Or the "best performance we have seen at any level?"
In any case, the standard is not the criteria. The criteria for the high jump or the persuasive essay are more or less fixed—no matter the age or ability of the student. All high jumps that are successful must ensure that the bar stays on; all persuasive essays should use lots of appropriate evidence and argument effectively. But how high should the bar be? How sophisticated and fluent should the essay be? That is the standards" question. And clearly, it depends upon the purpose of the assessment.
Standardized test
A test for which procedures have been developed to ensure consistency in administration and scoring across all testing situations.
Task
A task is a complex assessment activity. (The British use the phrase "integrated task" to capture this idea.) It demands that we bring to bear a repertoire of knowledge and skill to solve a multi-faceted problem or question through a series of judgments and actions. Most tasks are goal directed: they are "done" when we have successfully fashioned a performance or product to specifications. A task thus differs from a conventional test item in the same way that "successfully building a balsa bridge to withstand y points per square inch" differs from solving physics textbook problems.
Test
A structured performance situation that can be analyzed to yield numerical scores, from which inferences and made about how individuals differ in the construct measured by the test.
Test norms
For a particular test, the scores of a large group, typically converted to percentiles or another type of derived score, to which the scores of subsequent test-takers are compared.
Test reliability
The extent to which there is measurement error present in the scores yielded by a test.
Test-retest reliability
An approach to estimating test reliability in which individuals’ scores on a test administered at one point in time are correlated with their scores on the same test administered at another point in time.

--U-Z--

Validity
A test is valid if it is an apt instrument for making inferences about a student’s ability. Does the test design correspond to the “blueprint” of the course syllabus? Do the results correspond to what these students would likely do if they were confronted with a variety of "authentic" tasks requiring such knowledge? Does the small sample of questions accurately correlate with what students would do if we tested them on everything that was taught in the course? Do the results have predictive value, i.e. do they correlate with likely future success in the subject in question? If the answers are "yes" then the test is valid.

Context and Program Features Assessment

A-C | D-G | H-K | L-O | P-R | S-T | U-Z

--A-C--

Confirmation survey interview
In qualitative research, a type of interview that is used to confirm the findings obtained from data that were collected by other methods.
Content analysis
The study of particular aspects of the information contained in a document, film, or other form of communication.
Culture
The sum total of ways of living (e.g., values, customs, rituals, and beliefs) that are built up by a group of human beings and that are transmitted from one generation to another or from current members to newly admitted members.

--D-G--

Descriptive observational variable (or low-inference variable)
A variable that requires little inference on the part of an observer to determine its presence or level.
Focus group interview
A type of interview involving an interviewer and a group of research participants, who are free to talk with and influence each other in the process of sharing their ideas and perceptions about a defined topic.
General interview guide approach
A type of interview in which a set of topics is planned, but the order in which the topics are covered and the working of questions is decided as the interview proceeds.

--H-K--

Interview schedule
A measure that specifies the questions to be asked of each research participant, the sequence in which they are to be asked, and guidelines for what the interviewer is to say at the opening and closing of the interview.

--L-O--

Likert scale
A measure that asks individuals to check their level of agreement with various statements about an attitude object (e.g., strongly agree, agree, undecided, disagree, or strongly disagree).
Oral history
The use of oral interviews of individuals who witnessed or participated in particular events as sources of data about the past; also, the use of ballads, tales, and other forms of spoken language as sources of data about the past.

--P-R--

Questionnaire
A measure that presents a set of written questions to which all individuals in a sample respond.

--S-T--

Structured interview
A type of interview in which the interviewer asks a series of closed-form questions that either have yes-no answers or can be answered by selecting from among a set of short-answer choices.

--U-Z--

Unstructured interview
A type of interview in which the interviewer does not use a detailed interview guide, but instead asks situationally determined questions that gradually lead respondents to give the desired information.

Evaluation/Research Methods/Designs

A-C | D-G | H-K | L-O | P-R | S-T | U-Z

--A-C--

A-B design
A type of single-case experiment in which the researcher institutes a baseline condition (A), followed by the treatment (B). The target behavior is measured repeatedly during both conditions.
A-B-A design
A type of single-case experiment in which the researcher institutes a baseline condition (A), administers the treatment (B), and institutes a second baseline condition (A). The target behavior is measured repeatedly during all three conditions.
A-B-A designs
Any single-case experiment that has at least one baseline condition (designated A) and one treatment condition (designated B).
A-B-A-B design
A type of single-case experiment in which the researcher institutes a baseline condition (A), administers the treatment (B), institutes a second baseline condition (A), and then re-administers the treatment (B). The target behavior is measured repeatedly during all four conditions.
Accessible population
All the members of a set of people, events, or objects who feasibly can be included in the researcher’s sample.
Acquiescence bias
In testing, a type of response set in which individuals agree with items irrespective of their content.
Alpha level
The level of statistical significance that is selected prior to data collection for rejecting a null hypothesis.
Analysis of covariance (ANCOVA)
A procedure for determining whether the difference between the mean scores of two or more groups on one or more dependent variables is statistically significant, after controlling for initial differences between the groups on one or more extraneous variables. When the groups have been classified on several independent variables (called factors), the procedure can be used to determine whether each factor and the interactions between the factors have a statistically significant effect on the dependent variable, after controlling for the extraneous variable.
Analysis of variance (ANOVA)
A procedure for determining whether the difference between the mean scores of two or more groups on a dependent variable is statistically significant. When the groups have been classified on several independent variables (called factors), the procedure can be used to determine whether each factor and the interactions between the factors have a statistically significant effect on the dependent variable.
Analytic induction
In qualitative research, the process of inferring themes and patterns from an examination of data.
Audit trail
In a literature review, an account of all the procedures and decision rules that were used by the reviewer. In qualitative research, the process of documenting the materials and procedures used in each phase of a study.
Bias
A set to perceive events or other phenomena in such a way that certain facts are habitually overlooked, distorted, or falsified.
Case study research
The in-depth study of instances of a phenomenon in its natural context and from the perspective of the participants involved in the phenomenon.
Chi-square(x2) test
A nonparametric test of statistical significance that is used when the research data are in the form of frequency counts for two or more categories.
CIPP model
A type of evaluation that is designed to support the decision-making process in program management.
CIPP is an acronym for the four types of educational evaluation included in the model. Context evaluation, Input evaluation Process evaluation, and Product evaluation.
Cohort longitudinal research
A type of investigation in which changes in a population over time are studied by selecting a different sample at each data collection point from a population that remains constant.
Collinearity
The degree of correlation between any two variables that are to be used as predictors in a multiple regression analysis.
Compensatory rivalry (or John Henry effect)
In experiments, a situation in which control group participants perform beyond their usual level because they perceive that they are in competition with the experimental group.
Control group
In an experiment, a group of research participants who receive no treatment or an alternate treatment so that the effect of extraneous variables can be determined.
Convenience sample
A group of cases that are selected simply because they are available and easy to access.
Correlation research
A type of investigation that seeks to discover the direction and magnitude of the relationship among variables through the use of correlational statistics.
Cross-sectional longitudinal research
A type of investigation in which changes in a population over time are studied by collecting data at one point in time, but from samples that vary in age or developmental stage.

--D-G--

Dependent variable
A variable that the researcher thinks occurred after, and as a result of, another variable (called the independent variable). In a hypothesized cause-and-effect relationship, the dependent variable is the effect.
Descriptive research
In quantitative research, a type of investigation that measures the characteristics of a sample or population on pre-specified variables. In qualitative research, a type of investigation that involves providing a detailed portrayal of one or more cases.
Educational evaluation
The process of making judgments about the merit, value, or worth of an educational program, method, or other phenomenon.
Effect size
An estimate of the magnitude of a difference, a relationship, or other effect in the population represented by a sample.
Expertise-based evaluation
The use of experts to make judgments about the worth of an educational program.
External validity
The extent to which the results of a research study can be generalized to individuals and situations beyond those involved in the study.
Extraneous variable
In experiments, any aspect of the situation, other than the treatment variable, that can influence the dependent variable and that, if not controlled, can make it impossible to determine whether the treatment variable is responsible for any observed effect on the dependent variable.
Factor analysis
A statistical procedure for reducing a set of measured variables to a smaller number of variables (called factors or latent variables) by combining variables that are moderately or highly correlated with each other.
Formative evaluation
A type of evaluation that is done while a program is under development in order to improve its effectiveness, or to support a decision to abort further development so that resources are not wasted.
Goal-free evaluation
A type of evaluation research in which the evaluator investigates the actual effects of a program without being influenced by prior knowledge of the program’s stated goals.

--H-K--

Halo effect
The tendency for the observer’s early impressions of an individual being observed to influence the observer’s ratings of all variables involving the same individual.
Hawthorne effect
An observed change in research participants’ behavior based on their awareness of participating in an experiment, their knowledge of the researcher’s hypothesis, or their response to receiving special attention.
Historical research
The study of past phenomena for the purpose of gaining a better understanding of present institutions, practices, trends, and issues.
Hypothesis
The researcher’s prediction, derived from a theory or from speculation, about how two or more measured variables will be related to each other.
Independent variable
A variable that the researcher thinks occurred prior in time to, and had an influence on, another variable (called the dependent variable). In a hypothesized cause-and-effect relationship, the independent variable is the cause.
Institutional review board (IRB)
A committee that is established by an institution to ensure that participants in any research proposed by individuals affiliated with the institution will be protected from harm.

--L-O--

Main effect
In experiments, the influence of a treatment variable by itself (i.e., not an interaction with any other variable) on a dependent variable.
Matching
A procedure that equates two or more groups on the extraneous variable Z at the outset of a study so that it can be ruled out as an influence on any relationship between X and Y that is later observed.
Mean
A measure of central tendency calculated by dividing the sum of the scores in a set by a number of scores.
Median
A measure of central tendency corresponding to the middle point in a distribution of scores.
Mode
A measure of central tendency corresponding to the most frequently occurring score in a distribution of scores.
Multiple regression
A statistical procedure for determining the magnitude of the relationship between a criterion variable and a combination of two or more predictor variables.
Normal curve (or normal probability curve)
A distribution of scores that form a symmetrical, bell-shaped curve when plotted on a graph.
Null hypothesis
A prediction that no relationship between two measured variables will be found, or that no difference between groups on a measured variable will be found.
One-group pretest-posttest design
A type of experiment in which all participants are exposed to the same conditions: measurement of the dependent variable (pretest), implementation of the experimental treatment, and another measurement of the dependent variable (posttest).
One-shot case study design
A type of experiment in which an experimental treatment is administered and then a posttest is administered to measure the effects of the treatment.

--P-R--

Path analysis
A statistical method for testing the validity of a theory about causal links between three or more measured variables.
Pattern
In case study research, an inference that particular phenomena within a case or across cases are systematically related to each other. See also relational pattern and causal pattern.
Pilot study
A small-scale, preliminary investigation that is conducted to develop and test the measures or procedures that will be used in a research study.
Positivism
The epistemological doctrine that physical and social reality is independent of those who observe it, and that observations of this reality, if unbiased, constitute scientific knowledge.
Postmodernism
A broad social and philosophical movement that questions the rationality of human action, the use of positivist epistemology, and any human endeavor (e.g., science) that claims a privileged position with respect to the search for truth or that claims progress in its search for truth.
Postpositivism
The epistemological doctrine that social reality is a construction, and that it is constructed differently by different individuals.
Poststructuralism
The study of phenomena as systems, with the assumption that these systems have no inherent meaning.
Posttest
A measure that is administered following an experimental or control treatment or other intervention in order to determine the effects of the intervention.
Posttest-only control-group design
A type of experiment that includes three phases: random assignment of research participants to the experimental and control groups; administration of the treatment to the experimental group and either no treatment or an alternative treatment to the control group, and administration of a measure of the dependent variable to both groups.
Pretest
A measure that is administered prior to an experimental treatment or other intervention.
Primary source
A document (e.g., a journal article or a book) written by an individual who actually conducted the research study, developed the theory, witnessed the events, or formulated the opinions described in the document.
Qualitative research (or postpositivist research)
Inquiry that is grounded in the assumption that individuals construct social reality in the form of meanings and interpretations, and that these constructions tend to be transitory and situational. The dominant methodology is to discover these meanings and interpretations by studying cases intensively in natural settings and by subjecting the resulting data to analytic induction.
Quantitative research (or positivist research)
Inquiry that is grounded in the assumption that features of the social environment constitute an objective reality that is relatively constant across time and settings. The dominant methodology is to describe and explain features of this reality by collecting numerical data on observable behaviors of samples and by subjecting these data to statistical analysis.
Quasi-experiment
A type of experiment in which research participants are not randomly assigned to the experimental and control group.
Random assignment
The process of assigning individuals or groups (e.g., classrooms) to the experimental and control treatments such that each individual or group has an equal chance of being in each treatment.
Range
A measure of the amount of dispersion in a distribution of scores; it is expressed as the lowest and highest scores in the distribution.
Reliability
In case study research, the extent to which other researchers would arrive at similar results if they studied the same case using exactly the same procedures as the first researcher. In classical test theory, the amount of measurement error in the scores yielded by a test.
Representative design
The planning of experiments so that they reflect accurately both the real-life environments in which the phenomena being studied occur and the research participants’ natural behavior and cognitive processes in those environments.

--S-T--

Sampling
The process of selecting members of a research sample from a defined population, usually with the intent that the sample accurately represent that population.
Sampling error
The deviation of a sample statistic from its population value.
Sampling frame
A list of all members of the population from which a sample will be drawn.
Scattergram (or scatterplot)
A graph of the correlation between two variables, such that the scores of individuals on one variable are plotted on the x axis of the graph and the scores of the same individuals on another variable are plotted on the y axis.
Stanine
A type of standard score with a distribution that has a mean of 5 and a standard deviation of 2; the scores are continuous and have equality of units.
Statistic
Any number that describes a characteristic of a sample’s scores on a measure.
Statistical power
The probability that a particular test of statistical significance will lead to the rejection of a false null hypothesis.
Summative evaluation
A type of evaluation that is conducted to determine the worth of a fully developed program, especially in comparison with competing programs.
Survey research
The use of questionnaires or interviews to collect data about the characteristics, experiences, knowledge, or opinions of a sample or a population.
t test
A test of statistical significance that is used to determine whether the null hypothesis that two sample means come from identical populations can be rejected.
Tacit knowledge
Implicit meanings that the individuals being studied either cannot find the words to express or that they take so much for granted that they do not explicate them either in everyday discourse or in research interviews.

--U-Z--

Unobtrusive measure (or nonreactive measure)
A procedure for measuring variables by using data that are found naturally in a field setting and that can be collected without field participants’ awareness.
Validity
In testing, the appropriateness, meaningfulness, and usefulness of specific inferences made from test scorers.
Variability
The amount of dispersion in a distribution of scores; the greater the variability of a set of scores, the more they deviate from their mean.

Communities of Learning, Inquiry and Practice

Terms will be added as we go along.

Organizational Planning/Development

Terms will be added as we go along.

Cultural Competence

Terms will be added as we go along.

Resources

Books

The terms above were taken directly from the following two sources:

  1. Gall, M., Borg, W., & Gall, J. (1996). Educational research: An introduction. Sixth Edition. White Plains, NY: Longman.
  2. Wiggins, G., Browne, J., & Houston, H. (1991). Standards, not standardization. Stow, MA: Greater Insights.