Discrete and Continuous Variables were defined in the article An Introduction to Frequency Distributions. We shall continue our discussion on frequency distributions in this article by moving on to Frequency Distributions of Discrete and Continuous Variables.
Frequency Distribution of a Discrete Variable
Since, a discrete variable can take some or discrete values within its range of variation, it will be natural to take a separate class for each distinct value of the discrete variable as shown in the following example relating to the daily number of car accidents during 30 days of a month.
3 4 4 5 5 3
4 3 5 7 6 4
4 3 4 5 5 5
5 5 3 5 6 4
5 4 4 6 5 6
Table No. 2: Showing frequency distribution for daily number of car accidents during a month.
|Number of car accidents||Frequency|
Frequency Distribution of a Continuous Variable
For a continuous variable if we take a class for each distinct value of the variable, the number of classes will become unduly large, thus defeating the purpose of tabulation. In fact, since a continuous variable can assume an infinite number of values within its range of variation, the classification or sub-division of such data is necessarily artificial. Some guidelines that should be followed while dividing continuous data into classes are as follows:
- The classes should be mutually exclusive, i.e., non-overlapping. No two classes should contain the same interval of values of the variable.
- The classes should be exhaustive, i.e., they must cover the entire range of the data.
- The number of classes and the width of each class should neither be too small nor too large. In other words, there should be relatively fewer classes if the difference between the least value of the variable and its highest value is small and relatively more classes if the same difference is large. This difference between the least value of the variable and the greatest value of the variable is called the range of the variable or the data set.
- The classes should, preferably, be of equal width.
Let us consider the following example regarding daily maximum temperatures in in a city for 50 days.
28 28 31 29 35 33 28 31 34 29
25 27 29 33 30 31 32 26 26 21
21 20 22 24 28 30 34 33 35 29
23 21 20 19 19 18 19 17 20 19
18 18 19 27 17 18 20 21 18 19
Minimum Value= 17
Number of classes=5 (say)
width of each class=4
Table No. 3: Showing frequency distribution of temperature in a city for 50 days.
|Class Intervals(Temperatures in )||Frequency|
Defining few terms
Class Interval: The whole range of variable values is classified in some groups in the form of intervals. Each interval is called a class interval.
Class Frequency: The number of observations in a class is termed as the frequency of the class or class frequency.
Relative Frequency: Relative frequency is defined as the proportion of observations corresponding to a particular value of the variable or a class of values of the variable. Relative frequency of a particular value of the variable or a class of values of the variable is obtained by dividing the frequency corresponding to that particular value or that particular class by the total number of observations in the data set, i.e., the total frequency.
Relative frequency of any value or any class lies between 0 and 1. We calculate relative frequency if we want an idea about the relative importance of the particular value or class in relation to the total frequency.
Percent Frequency: Sometimes Relative frequency is expressed in percent as i.e., which is known as percent frequency. Expressed as percents, the percent frequency of all values or all the classes must add up to 100.
Class limits and Class boundaries:
Class limits are the two endpoints of a class interval which are used for the construction of a frequency distribution.The lowest value of the variable that can be included in a class interval is called the lower class limit of that class interval. The highest value of the variable that can be included in a class interval is called the upper class limit of that class interval. These are not the real limits or endpoints of a class interval. Hence, class limits are called apparent limits of a class.
Let us take for example, Table No. 3. The class intervals are 17-20, 21-24, 25-28, 29-32 and 33-36. Here, say for the class 17-20, the lower class limit is 17 and the upper class limit is 20. However, if there was an observation of 20.5, it would not be included in this class. An observation of 20.5 would be included in the class 21-24. Again if there was an observation of 16.5 it would be included in the class 17-20. Hence, effectively, the two actual endpoints of the class 17-20 are 16.5 and 20.5. These are actual or true limits of the class.
The two real endpoints of a class interval are called class boundaries. These are also called the real class limits. The basic rule is that class limits should have the same decimal place as the data set, but class boundaries should have one decimal place more. For example, let us say that we have the following data set on weight of a group of students (in Kg): 50.5, 50.8, 63.6, 48.4, 58.6, and 60.2. Here the class limits should have one decimal place and the class boundaries two decimal places. We obtain class boundaries from class limits by dividing the difference between the upper limit of a class and the lower limit of the next higher class into two equal parts. Say, we are considering the classes 17-20 and 21-24. 21-20=1. Again we have . We add 0.5 to the upper class limit of each class and subtract 0.5 from the lower class limit of each class. So, the class boundaries are 16.5-20.5, 20.5-24.5 and so on. For the class 16.5-20.5, 16.5 is the lower class boundary and 20.5 is the upper class boundary. It should be noted that the upper class boundary of the lower class coincides with the upper class boundary of the next higher class. This will give rise to confusion. Where should we include an observation which reads 20.5? The answer is 20.5 should be included in the class 20.5-24.5. This is because the rule is that the frequency corresponding to a class denotes the number of observations Lower Class Boundary but Upper Class Boundary. This is called the method of left inclusion. By including the lower class boundary but not the upper class boundary in the class we eliminate confusion about where to place an observation that happens to fall on a class boundary point.
Open-end classes: It may be so that some values in the data set are extremely small compared to the other values of the data set and similarly some values are extremely large in comparison. Then what we do is we do not use the lower limit of the first class and the upper limit of the last class. Such classes are called open end classes.
Class width: The length of the class is called the class width. It is also known as class size.
U.C.B. is Upper Class Boundary
L.C.B. is Lower Class Boundary
Class mark: The midpoint of a class interval is called class mark. It is the representative value of the entire class.
Frequency Density: It is the frequency per unit width of the class. It is given by:
Frequency densities are essential to compare two classes of unequal width. For classes equal class widths frequency densities are proportional to the class frequencies.
Relative Frequency Density: Relative frequency density of a class is relative frequency divided by the class width. It is given by:
1. Construct a frequency distribution of the variable ‘word length’ from the following:
“Row row row your boat gentlt down the stream,
Merrily merrily, merrily merrily, life is but a dream.”
Calculate the relative frequencies and the percent frequencies.
2. The following data are based on the responses of 50 employees of a certain office on the distances (in Km) between their residence and workplace:
1.5 2.2 6.2 7.1 12.3 13.6 2.4 6.5 9.1 5.0
18.2 7.1 3.0 15.2 15.2 4.0 17.2 1.6 14.2 5.1
4.0 16.5 4.3 5.7 8.9 6.0 5.1 18.9 5.6 2.3
9.1 11.5 12.5 1.7 9.5 2.0 10.3 11.8 4.4 10.5
9.3 18.0 8.2 8.9 4.3 14.1 7.4 3.7 2.8 6.7
Construct the frequency distribution from this data.