After the the last lesson, we have large 2000-point data set that we are trying to make sense out of. So we computed the maximum, minimum, and average of all of the data points. Not a bad start.
Suppose this is all you did. For the data given, you'll get an average of 541.851, a maximum of 619 and a minimim of 462. Are you really done? Do you really understand the numbers? Or, are you missing something, which is: what does the distribution of the numbers look like.
What does this mean? Well, the first number in the data-set is 557. As for the distribution, one may ask how many times the 557 appears in the whole data-set. What if the 557 appeared 100 times, and all of the other numbers appear only once? Or what if each number appears 5 times each? Get what we mean by the distribution of the numbers?
Typically, we don't look how many times a single number appears (this is too detailed). Instead we look at how many a small range of numbers appears. Here's an example.
In this data, the maximum number is 619 and the minimum 462. Suppose we wanted to look at the occurrence of 10 groups of numbers. We'd do
In other words, we'll look at groups of numbers $\Delta$ (or $15.7$) wide. This means, we'll look for numbers in the range $462$ to $462+\Delta$ or $462$ to $477.7$. Next, we'll look for numbers in the range $477.7$ to $477.7+\Delta$ or $477.7$ to $493.4$. All told, we'll count the occurrence of times a number in these ranges:
462.0 to 477.7
477.7 to 493.4
493.4 to 509.1
509.1 to 524.8
524.8 to 540.5
540.5 to 556.2
556.2 to 571.9
571.9 to 587.6
587.6 to 603.3
603.3 to 619.0
These ranges are called "bins" and note how each is $\Delta$ or $15.7$ wide.
In this part of making a histogram, we'll compute the bin boundaries and display them to the screen.
Now you try. Set the bins variable to the numbers of bins you want to have and see if the bin boundaries come out right.
Type your code here:
See your results here:
Here's a breakdown of the code thus far:
Part 1: Sets up a few things.
Part 2: Sets up things needed for the bins, including the number of bins, $\Delta$, and some arrays we'll need. count will be used when we actually make the histogram in the next part, and bin_low and bin_high will be the low and high boundaries of each bin. So for example, for the 10 bins we proposed above, bin_low will be equal to 462.0 and bin_high to 477.7.
Part 3: This is where we actually compute the bin boundaries. Here we count through the number of bins with variable i. The logic we use is that bin_low should be 462 (or the minimum number in the data set) and bin_high should be $462+\Delta$. With this bin_low should be the minimum + $\Delta$, and bin_low should be the minimum + $2\Delta$ and so on. (Notice the pattern is bin_low[n]=min+(n-1)$\Delta$.)
Any bin_high value is just it's corresponding bin_low + $\Delta$.
Share your code
Show a friend, family member, or teacher what you've done!