Introduction to Statistics

 

 

Population:  Universe.  The total category under consideration.  It is the data which we have not completely examined but to which our conclusions refer.

 

Sample:  That portion of the population that is available, or is to be made available, for analysis.

 

Parameter:  A characteristic of a population.  E.g., the population mean, µ.

 

Statistic:  A measure derived from the sample data.  E.g., the sample mean (X with a bar over it).

 

Statistical Inference:  The process of using sample statistics to draw conclusions about population parameters.  For instance, using the sample mean (based on a sample of, say, n=1000) to draw conclusions about µ (population of, say, 300 million).

 

Descriptive Statistics:  Those statistics that summarize a sample of numerical data in terms of averages and other measures for the purpose of description.  This includes the presentation of data in the form of graphs, charts, and tables.  Descriptive statistics are not concerned with the theory and methodology for drawing inferences that extend beyond the particular set of data examined.

 

Primary data:  Data compiled by the researcher.  Secondary data:  Data compiled or published elsewhere, e.g., Statistical Abstracts, census data.

 

Qualitative data result in categorical responses.  Quantitative data result in numerical responses, and may be discrete or continuous.  Discrete data arise from a counting process.  Continuous data arise from a measuring process.

 

Probability Sample:  A sample collected in such a way that every element in the population has a known chance of being selected.

 

Simple Random Sample:  A sample collected in such a way that every element in the population has an equal chance of being selected.

 

 


 

A.  Sources of data

                        1.  Primary

                                     a. Surveys

                                                 i.    mail

                                                 ii.   telephone

                                                 iii.  personal interview

                        2.  Secondary

 

 

B.  Survey errors

                        1.  Response errors

                                     a.  Subject lies

                                     b.  Subject makes mistakes

                                     c.  Interviewer makes mistakes

                                     d.  Interviewer cheating

                                     e.  Interviewer effects

                        2.  Nonresponse error

 

 

C.  Types of samples

                        1.  Nonprobability samples

                                     a.  Convenience (chunk) sample

                                     b.  Judgment sample

                                     c.  Quota sample

                        2.  Probability samples

                                     a.  Simple random sample

                                     b.  Other types of probability samples

                                                 i.    systematic sample

                                                 ii.   stratified sample

                                                 iii.  cluster sample

 

 

D.  Types of data

                        1.  Qualitative

                                     a.  Nominal

                        2.  Quantitative

                                     a.  Discrete vs. Continuous

                                     b.  Ordinal, Interval, Ratio

 


Frequency Distribution:  Records data grouped into classes and the number of observations that fell into each class.  A Percentage Distribution records the percent of the observations that fell into each class.

 

Example:  A (fictitious) sample was taken of 200 graduate students at CUNY.  Each was asked for his or her weekly salary.  The pathetic responses ranged from about $590 to $520.  If we wanted to display the data in, say, 7 equal intervals, we would use an interval width of $10.

 

width of interval   = range/number of classes

                    = $70/7 = $10/class.

 

     Frequency Distribution    Percentage Distribution

Weekly earnings         Frequency        Percentage

520 and under   530              6               3 %

530   "     "          540             30              15

540   "     "          550             38              19

550   "     "          560             52              26

560   "     "          570             42              21

570   "     "          580             24              12

580     to             590              8                4     

                             n =           200            100  %

 

A Cumulative Distribution focuses on the number or percentage of cases that lie below or above specified values rather than within intervals.

 

Cumulative Frequency Distribution      Cumulative Percentage Distribution

Weekly earnings   Frequency   Percentage

less than     520              0           0 %

"      "     530              6           3

"      "     540             36          18

"      "     550             74          37

"      "     560            126         63

"      "     570            168         84

"      "     580            192         96

"      "     590            200        100 

 

 


A.  Measures of Location

                        1.  Measures of central tendency

                                     a.  Mean

                                     b.  Median

                                     c.  Mode

                        2.  Quantiles ‑ measures of noncentral tendency

                                     a.  Quartiles

                                     b.  Percentiles

 

B.  Measures of Dispersion

                        1.  Range

                        2.  Interquartile range

                        3.  Variance

                        4.  Standard Deviation

                        5.  Coefficient of Variation