If you've made it through the last few lessons, you're doing great. You know how to open a file, read from it, and even do a bit of data mining (meaning, figuring out things from the data).
It turns out that one of the big challenges in data science is in dealing with data files, and getting them into your code so you can analyze them. In the last lessons, a file called
sample.dat
was used. It was a one column data file of numbers. Easy, huh?
Suppose now we have a
two column data file, where each line is a pair of numbers, separated by a comma. (These are usually called "CSV files." CSV=comma separated value.)
So here's the next challenge: dealing with a two column data file. Let's start using a file called
co2.csv
, which has two columns. The first is the year, and the second is the CO$_2$ concentration in our atmosphere. Here are the first few lines of the file
1958,316
1959,316
1960,317
1961,318
1962,319
1963,320
1964,320
...
...
At this point, see if you can simply display the file to the screen. This is not a bad thing to do even at a 'real' data mining job.
But here's the issue: what do you do with the two columns? You can't use
file_read_number
, because the file doesn't contain straight numbers. It's a CSV file in the format of
year,concentration
. Luckily, we have another function called
file_read
that just reads in a line from a file
into a string. It doesn't try to process it into a number. It just reads in what it sees, and gives it to you.
We'll show you how to work with a string of numbers in the next lesson. At this point, just see if you can display the data in
co2.csv
to the screen.
Move the mouse over a dotted box for more information.