# which of the following statements is/are incorrect? to represent the share of a particular category, bar chart is the most appropriate graphical representation. the multiplication of the total number of observations and relative frequency of a particular observation should be equal to the frequency of that observation. mean can be defined for a categorical variable. mode of a categorical variable is the widest slice in a pie chart.

### Mohammed

Guys, does anyone know the answer?

get which of the following statements is/are incorrect? to represent the share of a particular category, bar chart is the most appropriate graphical representation. the multiplication of the total number of observations and relative frequency of a particular observation should be equal to the frequency of that observation. mean can be defined for a categorical variable. mode of a categorical variable is the widest slice in a pie chart. from screen.

## 4. Descriptive Statistics and Graphic Displays

Chapter 4. Descriptive Statistics and Graphic Displays Most of this book, as is the case with most statistics books, is concerned with statistical inference, meaning the practice of drawing … - Selection from Statistics in a Nutshell, 2nd Edition [Book]

Skip to main content

## Statistics in a Nutshell, 2nd Edition by Sarah Boslaugh

Statistics in a Nutshell, 2nd Edition by Sarah Boslaugh Chapter 4. Descriptive Statistics and Graphic Displays

Most of this book, as is the case with most statistics books, is concerned with statistical inference, meaning the practice of drawing conclusions about a population by using statistics calculated on a sample. However, another type of statistics is the concern of this chapter: descriptive statistics, meaning the use of statistical and graphic techniques to present information about the data set being studied. Nearly everyone involved in statistical work works with both types of statistics, and often, computing descriptive statistics is a preliminary step in what will ultimately be an inferential statistical analysis. In particular, it is a common practice to begin an analysis by examining graphical displays of a data set and to compute some basic descriptive statistics to get a better sense of the data to be analyzed. You can never be too familiar with your data, and time spent examining it is nearly always time well spent. Descriptive statistics and graphic displays can also be the final product of a statistical analysis. For instance, a business might want to monitor sales volumes for different locations or different sales personnel and wish to present that information using graphics, without any desire to use that information to make inferences (for instance, about other locations or other years) using the data collected.

## Populations and Samples

The same data set may be considered as either a population or a sample, depending on the reason for its collection and analysis. For instance, the final exam grades of the students in a class are a population if the purpose of the analysis is to describe the distribution of scores in that class, but they are a sample if the purpose of the analysis is to make some inference from those scores to the scores of other students (perhaps students in different classes or different schools). Analyzing a population means your data set is the complete population of interest, so you are performing your calculations on all members of the group of interest to you and can make direct statements about the characteristics of that group. In contrast, analyzing a sample means you are working with a subset drawn from a larger population, and any statements made about the larger group from which your sample was drawn are probabilistic rather than absolute. (The reasoning behind inferential statistics is discussed further in Chapter 3.) Samples rather than populations are often analyzed for practical reasons because it might be impossible or prohibitively expensive to study all members of a population directly.

The distinction between descriptive and inferential statistics is fundamental, and a set of notational conventions and terminology has been developed to distinguish between the two. Although these conventions differ somewhat from one author to the next, as a general rule, numbers that describe a population are referred to as parameters and are signified by Greek letters such as µ (for the population mean) and σ (for the population standard deviation); numbers that describe a sample are referred to as statistics and are signified by Latin letters such as (the sample mean) and s (the sample standard deviation).

## Measures of Central Tendency

Measures of central tendency, also known as measures of location, are typically among the first statistics computed for the continuous variables in a new data set. The main purpose of computing measures of central tendency is to give you an idea of what a typical or common value for a given variable is. The three most common measures of central tendency are the arithmetic mean, the median, and the mode.

## The Mean

The arithmetic mean, or simply the mean, is often referred to in ordinary speech as the average of a set of values. Calculating the mean as a measure of central tendency is appropriate for interval and ratio data, and the mean of dichotomous variables coded as 0 or 1 provides the proportion of subjects whose value on the variable is 1. For continuous data, for instance measures of height or scores on an IQ test, the mean is simply calculated by adding up all the values and then dividing by the number of values. The mean of a population is denoted by the Greek letter mu (µ) whereas the mean of a sample is typically denoted by a bar over the variable symbol: for instance, the mean of x would be written and pronounced “x-bar.” Some authors adapt the bar notation for the names of variables also. For instance, some authors denote “the mean of the variable age” by , which would be pronounced “age-bar.”

Suppose we have a population with only five cases, and these are the values for members of that population for the variable x:

100, 115, 93, 102, 97

We can calculate the mean of x by adding these values and dividing by 5 (the number of values):

µ = (100 + 115 + 93 + 102 + 97)/5 = 507/5 = 101.4

Statisticians often use a convention called summation notation, introduced in Chapter 1, which defines a statistic by describing how it is calculated. The computation of the mean is the same whether the numbers are considered to represent a population or a sample; the only difference is the symbol for the mean itself. The mean of a population, as expressed in summation notation, is shown in Figure 4-1.

## Soc 106

Study with Quizlet and memorize flashcards containing terms like descriptive statistics, RELATIVE FREQUENCIES: CATEGORICAL DATA, FREQUENCY DISTRIBUTIONS AND BAR GRAPHS:CATEGORICAL DATA and more.

## Soc 106 - Ch 3

Term 1 / 41

descriptive statistics

Click the card to flip 👆

Definition 1 / 41

statistical methods are descriptive or inferential. The purpose of descriptivestatistics is to summarize data, to make it easier to assimilate the information

Quantitative variables also have two key features to describe numerically:

- The center of the data—a typical observation

- The variability of the data—the spread around the center

Most importantly, the mean describes the center and the standard deviation describes the variability.

association—how certain values for one variable may tend to go with certain values of the other.For quantitative variables, the correlation describes the strength of the association, and regressionanalysis predicts the value of one variable from a value of the other variable.

Click the card to flip 👆

Created by Rosie_Valencia6

### Terms in this set (41)

descriptive statistics

statistical methods are descriptive or inferential. The purpose of descriptivestatistics is to summarize data, to make it easier to assimilate the information

Quantitative variables also have two key features to describe numerically:

- The center of the data—a typical observation

- The variability of the data—the spread around the center

Most importantly, the mean describes the center and the standard deviation describes the variability.

association—how certain values for one variable may tend to go with certain values of the other.For quantitative variables, the correlation describes the strength of the association, and regressionanalysis predicts the value of one variable from a value of the other variable.

RELATIVE FREQUENCIES: CATEGORICAL DATA

For categorical variables, we list the categories and show the number of observationsin each category. To make it easier to compare different categories, we also reportproportions or percentages in the categories, also called relative frequencies.Thepro-portion equals the number of observations in a category divided by the total numberof observations. It is a number between 0 and 1 that expresses the share of the ob-servations in that category. The percentage is the proportion multiplied by 100. Thesum of the proportions equals 1.00. The sum of the percentages equals 100.

Table 3.1 lists the different types of house-holds in the United States in 2015. Of 116.3 million households, for example, 23.3million were a married couple with children, for a proportion of 23.3/116.3 = 0.20.A percentage is the proportion multiplied by 100. That is, the decimal place ismoved two positions to the right. For example, since 0.20 is the proportion of familiesthat are married couples with children, the percentage is 100(0.20) = 20%.

table 3.1 is a "frequency distribution"

A frequency distribution is a listing of possible values for a variable,together with the number of observations at each value.

When the table shows the proportions or percentages instead of the numbers, it iscalled a relative frequency distribution.

FREQUENCY DISTRIBUTIONS AND BAR GRAPHS:CATEGORICAL DATA

A bar graph has a rectangular bar drawn over each category.The height of the bar shows the frequency or relative frequency in that category. Fig-ure 3.1 is a bar graph for the data in Table 3.1. The bars are separated to emphasizethat the variable is categorical rather than quantitative. Since household structureis a nominal variable, there is no particular natural order for the bars. The order ofpresentation for an ordinal variable is the natural ordering of the categories

Another type of graph, the pie chart, is a circle having a "slice of the pie" for eachcategory. The size of a slice represents the percentage of observations in the category.A bar graph is more precise than a pie chart for visual comparison of categories with similar relative frequencies.

FREQUENCY DISTRIBUTIONS: QUANTITATIVE DATA

Frequency distributions and graphs also are useful for quantitative variables. Thenext example illustrates this

Table 3.2 lists all 50 states in the United States andtheir 2015 violent crime rates. This rate measures the number of violent crimes inthat state per 10,000 population. For instance, if a state had 12,000 violent crimesand a population size of 2,300,000, its violent crime rate was (12,000/2,300,000) ×10,000 = 52. Tables, graphs, and numerical measures help us absorb the information in these data.

Table 3.3 also shows the relative frequencies, using proportions and percentages.As with any summary method, we lose some information as the cost of achievingsome clarity. The frequency distribution does not show the exact violent crime ratesor identify which states have low or high rates

The intervals of values in frequency distributions are usually of equal width. Thewidth equals 10 in Table 3.3. The intervals should include all possible values of thevariable. In addition, any possible value must fit into one and only one interval; thatis, they should be mutually exclusive.

## 1. How to describe data graphically?

[toc] Statistics are used in many aspects of our daily lives: to predict or forecast sales of a new product, the weather, grade point averages, and so on. We constantly need to absorb and interpret substantial amounts of data. However, once the data are collected, what should we do with them? How do data impact decision making? Generally, statistics help us to make sense of

1. How to describe data graphically?

How to make decisions in an uncertain environment?

How to think statistically?

What is a variable and what are the measurement levels of a variable?

How to graphically describe categorical variables?

How to graphically describe time series data?

How to graphically describe numerical variables?

How to graphically display two variables simultaneously?

What are common data presentation errors?

Bullet points Practice questions

Answers to the questions

Statistics are used in many aspects of our daily lives: to predict or forecast sales of a new product, the weather, grade point averages, and so on. We constantly need to absorb and interpret substantial amounts of data. However, once the data are collected, what should we do with them? How do data impact decision making? Generally, statistics help us to make sense of data. In this first chapter, we will introduce graphical ways of presenting data, that allow one to better understand the data. Examples of such graphical displays are: tables, bar charts, pie charts, histograms, stem-and-leaf displays, and so forth.

Back to top

## How to make decisions in an uncertain environment?

Oftentimes, decisions are based on limited information. Suppose, for instance, that one is interested in bringing a new product to the market. Before doing so, the manufacturer wants to undertake a market research survey to assess the potential level of demand. While the manufacturer is interested in all potential buyers (population), this group is often too large to analyze. Collecting data for the entire population is impossible or prohibitively expensive. Therefore, a representative subgroup of the population (sample) is needed.

### Sample and population

A population is defined as the complete set of all items (observations) that one is interested in. The population size is denoted by N and can be very large, at times even infinite. A sample is defined as an observed subset of the population. The sample size is denoted by n.

### Sampling

There are different ways to obtain a representative subgroup (sample) of the population. This process is also called sampling. For instance, simple random sampling (SRS) can be conducted. SRS is a procedure to select a sample of n objects (individuals) in such a way that each member of the population is chosen purely by chance. The selection of one member does not influence the selection (chance) of another member. In other words, each observation (member/individual) has an equal chance of being included in the sample. SRS is very common, such that the adjective simple is often dropped, which implies that the resulting sample is commonly called a random sample. A second way of sampling is called systematic sampling. For systematic sampling, the population list is arranged in some manner unconnected with the subject of interest. In systematic sampling, then, every jth item in the population is selected, where j is the ratio of the population size N to the desired sample size n, that is: j = N / n. The first item to be included in randomly selected. Systematic samples provide a good representation of the population if there is no cyclial variation in the population.

### Parameter and statistic

A parameter is defined as a measure that describes a population characteristic. A statistic is defined as a numerical measure that describes a sample characteristic. For instance, if we measure the average IQ of 500 registered voters, this average is called a statistic. If, for some reason, we are able to calculate the average IQ of the entire population, this resulting average would be called a parameter.

In practice, we are commonly unable to directly measure the parameters of interest. Therefore, we use statistics to gain some understanding of the population values. We must, however, realize that there is always some element of uncertainty involved, as we do not know the exact value of the population. There are two sources of error that influence this uncertainty. First, sampling error is due to the fact that information is available on only a subset of the population members (discussed in more detail in chapters 6, 7, and 8). Second, nonsampling error is unconnected to the sampling procedure used. Examples of nonsamping error are: the population that is sampled is actually not the relevant one; survey participants may give inaccurate or dishonest answers; survey subjects may not respond at all to (certain) questions.

Back to top

## How to think statistically?

Statistical thinking begins with problem definition:

What information is required?

What is the population of interest?

How should sample members be selected?

How should information from the sample members be obtained?

After answering these questions, we are interested in the question how to use sample information to make decisions about the population. For this decision making, both descriptive statistics and inferential statistics are required. Descriptive statistics focus on graphical and numerical procedures; they are used to summarize and process data. Next, inferential statistics use the data to make predictions, forecasts, and estimates to make decisions.

Back to top

## What is a variable and what are the measurement levels of a variable?

A variable is a characteristic of an individual or objects. Examples are age and weight.

Guys, does anyone know the answer?