# Polling terms

**ABS**: address-based sampling.**Convenience sampling**: A convenience sample is a type of non-probability sampling method where the sample is selected based on the convenience and accessibility of the participants. In other words, the researcher selects individuals who are easy to reach or readily available, rather than using a random selection process.

Characteristics of a convenience sample:

Non-random selection: Participants are chosen based on their proximity or availability to the researcher.

Ease of recruitment: Convenience samples are often less time-consuming and less expensive to gather compared to probability-based samples.

Limited generalizability: Because the sample is not randomly selected, the results may not be representative of the entire population. This limits the ability to make statistical inferences or generalize the findings to the larger population.

Potential for bias: Convenience samples may be more prone to selection bias, as the sample may not accurately reflect the characteristics of the population. This can lead to over- or under-representation of certain groups.

Examples of convenience sampling:

A researcher surveying college students by sampling from their own classes.

An online survey shared on social media platforms, where participants self-select to respond.

A street survey where a researcher interviews people in a busy shopping area.

Despite its limitations, convenience sampling can be useful in certain situations, such as:

Pilot studies: Convenience samples can help researchers test and refine their methods before conducting a larger, more rigorous study.

Hard-to-reach populations: When studying populations that are difficult to access or identify, convenience sampling may be the only feasible option.

Exploratory research: Convenience samples can provide initial insights and generate hypotheses for further research.

However, it is essential for researchers to recognize and acknowledge the limitations of convenience sampling and to be cautious when interpreting and generalizing the results.

**Ground truth**: A statement regarding something indirectly observed. In presidential preference polling, ground truth can never be checked because it is not practical to obtain survey answers from all potential voters. The election is held in the future, is not directly observable in advance, and is not repeatable.**IVR**: Interactive Voice Response (IVR) is a technology that allows a computer to interact with humans through the use of voice and DTMF (Dual-Tone Multi-Frequency) tones input via a keypad. In the context of surveys, IVR enables researchers to conduct automated telephone surveys without the need for human interviewers.

Here's how an IVR survey typically works:

The IVR system dials a list of telephone numbers.

When a call is answered, the system plays a pre-recorded or dynamically generated voice message that introduces the survey and provides instructions.

The respondent is then prompted to answer a series of questions by either speaking their responses or entering them using the phone's keypad.

The IVR system records the responses and moves on to the next question based on the answers provided and the survey's logic.

Once the survey is completed, the system thanks the respondent and ends the call.

Some key features and benefits of IVR surveys include:

Automation: IVR surveys can be conducted without human interviewers, reducing costs and allowing for larger sample sizes.

Consistency: All respondents hear the same questions in the same order, reducing variability due to interviewer bias.

24/7 availability: IVR surveys can be conducted at any time, allowing respondents to complete the survey at their convenience.

Multilingual support: IVR systems can offer surveys in multiple languages, making them more accessible to diverse populations.

However, IVR surveys also have some limitations:

Limited complexity: IVR surveys are best suited for short, simple questions with clear answer choices.

Lower response rates: Some people may be less likely to complete an automated survey than one conducted by a human interviewer.

Lack of personal touch: The automated nature of IVR surveys can make them feel impersonal, potentially affecting response quality.

Despite these limitations, IVR surveys remain a valuable tool for researchers, particularly when conducting large-scale, cost-effective surveys.

**Mail-in polls**: A type of opt-in survey.**Opt-in survey**: A survey that requires explicit prior consent. They can lead to skewed results, such as overestimating fringe beliefs or the percentage of people holding certain qualifications, especially among younger respondents. The quality of data collected through these surveys is questionable, as participants may speed through surveys to earn money, leading to inaccurate responses. Researchers use online opt-in surveys because they are cheaper, faster, and more convenient than probability-based surveys, despite potential data quality issues. A study found that a significant portion of respondents in an opt-in survey may have misrepresented their background to qualify for the survey and earn money. Researchers can minimize bogus responses by using attention checks, internet protocol address tracking, anti-bot software, and monitoring survey completion time. However, some low-quality responses may still slip through. Alternative models, such as opt-in volunteer surveys, can create different incentives for participants that do not rely on financial rewards.**Tracking panel**: A group of respondents initially selected randomly who are re-polled over some interval.**Poll aggregation**: Combining the results of multiple polls, also known as "poll aggregation," is a common practice to improve the accuracy and reliability of the overall estimate. There are several methods to do this, and the choice depends on factors such as the number of polls, their sample sizes, and the consistency of their results. Here are a few common approaches:

Simple average: Calculate the average percentage for each candidate across all polls. This method treats all polls equally, regardless of their sample size or quality.

Weighted average: Assign weights to each poll based on factors such as sample size, recency, or pollster rating, and then calculate the weighted average for each candidate. This method gives more importance to polls considered more reliable or representative. This is the method used to report aggregated swing state polls on the home page.

Inverse variance weighting: Weight each poll by the inverse of its variance (which is related to the margin of error). This method gives more weight to polls with smaller margins of error, as they are considered more precise. This is not currently being done because margins of error are similar.

Bayesian aggregation: Use a Bayesian model to estimate the probability distribution of the true proportion of support for each candidate, considering the results and uncertainty of each poll. This method can incorporate prior information and account for various sources of uncertainty. This is being done to arrive at an overall likelihood based on currrent polling in light of the 2020 results.

**Population**: A collection of units of observation, such as registered voters, likely voters, likely voters who identify as white, male conservative registered Democrats, taken as a whole. Given knowledge of all such units, it is desired to classify them. When they cannot all be identified and queried, a sample is relied upon to be representative.**Probability based sample**: A probability-based sample is a type of sample in which every unit in the population has a known, non-zero probability of being selected. This sampling method relies on the principles of probability theory to choose a sample that is representative of the population of interest. The main advantage of probability-based sampling is that it allows researchers to make statistical inferences about the population based on the sample results.

Key features of a probability-based sample:

Random selection: Each unit in the population has an equal or known chance of being selected.

Representativeness: The sample is intended to be representative of the population, meaning that the characteristics of the sample closely resemble those of the population.

Generalizability: Because the sample is representative, findings from the study can be generalized to the larger population, within a certain margin of error.

Reduced bias: Probability-based sampling helps minimize selection bias, as the sample is chosen based on chance rather than the researcher's preferences or convenience.

Common probability-based sampling methods include:

Simple random sampling: Each unit in the population has an equal chance of being selected.

Stratified random sampling: The population is divided into subgroups (strata) based on specific characteristics, and then units are randomly selected from each stratum.

Cluster sampling: The population is divided into clusters (e.g., geographic areas), and then a random sample of clusters is selected. All units within the selected clusters are included in the sample.

Systematic sampling: Units are selected from the population at regular intervals (e.g., every 10th unit on a list) after a random starting point.

In contrast, non-probability sampling methods, such as convenience sampling or snowball sampling, do not give every unit in the population a known chance of being selected. While these methods can be useful in certain situations, they do not allow for statistical inference about the population and may be more prone to bias.

**Random sample**: A random sample is a subset of individuals chosen from a larger population in such a way that each individual has an equal probability of being selected. The purpose of taking a random sample is to obtain a representative group that can be used to make inferences about the larger population without bias.

Key characteristics of a random sample:

Equal probability of selection: Each member of the population has the same chance of being included in the sample.

Independence: The selection of one individual does not affect the probability of selecting any other individual.

Representativeness: A well-chosen random sample should be representative of the population, meaning that the characteristics of the sample closely mirror those of the larger population.

Unbiased: Random sampling helps minimize bias in the selection process, as it eliminates the possibility of the researcher consciously or unconsciously choosing individuals based on specific characteristics.

IID (Independent and Identically Distributed): In the context of random sampling, IID is a crucial concept. It means that each observation or data point in the sample is independent of the others and comes from the same underlying probability distribution.

Independence: The value of one observation does not influence the value of another observation. In other words, knowing the value of one data point provides no information about the value of any other data point.

Identical distribution: All observations in the sample come from the same probability distribution. This means that the sample data's statistical properties, such as mean and variance, are consistent across the sample and representative of the population.

When a random sample is IID, it allows researchers to use various statistical methods to make inferences about the population. For example, the Central Limit Theorem, which states that the distribution of sample means approaches a normal distribution as the sample size increases, relies on the assumption that the sample is IID.

However, in practice, achieving a truly IID random sample can be challenging. Factors such as sampling bias, non-response bias, and measurement errors can introduce dependence or differences in the distribution of observations. Researchers must be aware of these potential issues and take steps to minimize their impact on the sample's representativeness.

**Online opt-in polls**: Polls using non-random "convenience sampling" to recruit respondents from various online sources. Subject to self-selection bias.**Probability-based panel**: A national survey panel recruited using random sampling from a database that includes most people in the population. Today, most such panels in the U.S. recruit by drawing random samples of residential addresses or telephone numbers. Typically, data collection with these panels is done online. However, some of these panels interview a small fraction of respondents (usually about 5% or fewer) using an offline mode such as live telephone. These panels are “probability-based” because the chance that each address or phone number was selected is known. However, the chance that each selected person will join the panel or take surveys after joining is not known.**RBS**: voter registration-based sampling.**RDD**: random-digit dialing**Sampling frame**: A list of the population of interest from which the survey sample is selected.**Sponsor**: Organization that conceives of a poll, funds it, and publicly releases the results. Used interchangeably with "pollster".**Text message polling**: Contacting a sample of cellphone numbers to direct respondents to an online survey or ask questions via text.**Vendor**: Organization that collects the survey data, often sharing tasks with the sponsor.