Introduction
to Statistics
Population: Universe.
The total category under consideration. It is the data which we have not completely
examined but to which our conclusions refer.
Sample: That portion of the population that is
available, or is to be made available, for analysis.
Parameter: A characteristic of a population. E.g., the population mean, µ.
Statistic: A measure derived from the sample data. E.g., the sample mean (X with a bar over it).
Statistical Inference: The process of using sample statistics to
draw conclusions about population parameters.
For instance, using the sample mean (based on a sample of, say, n=1000) to draw conclusions
about µ (population of, say, 300 million).
Descriptive Statistics: Those statistics that summarize a sample of
numerical data in terms of averages and other measures for the purpose of
description. This includes the
presentation of data in the form of graphs, charts, and tables. Descriptive statistics are not concerned with
the theory and methodology for drawing inferences that extend beyond the
particular set of data examined.
Primary data: Data compiled by the researcher. Secondary data: Data compiled or published elsewhere, e.g.,
Statistical Abstracts, census data.
Qualitative data result in categorical
responses. Quantitative data
result in numerical responses, and may be discrete or continuous. Discrete data arise from a counting
process. Continuous data arise
from a measuring process.
Probability Sample: A sample collected in such a way that every
element in the population has a known chance of being selected.
Simple Random Sample: A sample collected in such a way that every
element in the population has an equal chance of being selected.
A. Sources of data
1.
Primary
a. Surveys
i. mail
ii. telephone
iii. personal interview
2.
Secondary
B. Survey errors
1.
Response errors
a. Subject lies
b. Subject makes mistakes
c. Interviewer makes mistakes
d. Interviewer cheating
e. Interviewer effects
2. Nonresponse error
C. Types of samples
1. Nonprobability samples
a. Convenience (chunk) sample
b. Judgment sample
c. Quota sample
2.
Probability samples
a. Simple random sample
b. Other types of probability samples
i. systematic
sample
ii. stratified sample
iii. cluster sample
D. Types of data
1.
Qualitative
a. Nominal
2.
Quantitative
a. Discrete vs. Continuous
b. Ordinal, Interval, Ratio
Frequency Distribution:
Records data grouped into classes and the number of observations that
fell into each class. A Percentage
Distribution records the percent of the observations that fell into each
class.
Example: A (fictitious) sample was taken of 200 graduate
students at CUNY. Each was asked for his
or her weekly salary. The pathetic
responses ranged from about $590 to $520.
If we wanted to display the data in, say, 7 equal intervals, we would
use an interval width of $10.
width of interval = range/number of classes
=
$70/7 = $10/class.
Frequency
Distribution Percentage Distribution
Weekly earnings Frequency Percentage
520 and under 530 6 3 %
530
" " 540 30 15
540
" " 550 38 19
550
" " 560 52 26
560
" " 570 42 21
570
" " 580 24 12
580
to 590 8 4
n = 200 100 %
A Cumulative Distribution
focuses on the number or percentage of cases that lie below or above specified
values rather than within intervals.
Cumulative
Frequency Distribution Cumulative
Percentage Distribution
Weekly earnings Frequency Percentage
less than 520
0 0 %
" " 530 6 3
" " 540 36 18
" "
550 74 37
" " 560 126 63
" "
570 168 84
" "
580 192 96
" "
590 200 100
A. Measures of Location
1.
Measures of central tendency
a. Mean
b. Median
c. Mode
2. Quantiles ‑ measures of noncentral
tendency
a. Quartiles
b. Percentiles
B. Measures of Dispersion
1.
Range
2. Interquartile range
3.
Variance
4.
Standard Deviation
5.
Coefficient of Variation