Lesson goal: Data Science: Two column data files

Previous: Visualize data with a plot | Home | Next: Dealing with a line of CSVs

If you've made it through the last few lessons, you're doing great. You know how to open a file, read from it, and even do a bit of data mining (meaning, figuring out things from the data).

It turns out that one of the big challenges in data science is in dealing with data files, and getting them into your code so you can analyze them. In the last lessons, a file called sample.dat was used. It was a one column data file of numbers. Easy, huh?

Suppose now we have a two column data file, where each line is a pair of numbers, separated by a comma. (These are usually called "CSV files." CSV=comma separated value.)

So here's the next challenge: dealing with a two column data file. Let's start using a file called co2.csv, which has two columns. The first is the year, and the second is the CO$_2$ concentration in our atmosphere. Here are the first few lines of the file
1958,316
1959,316
1960,317
1961,318
1962,319
1963,320
1964,320
...
...


At this point, see if you can simply display the file to the screen. This is not a bad thing to do even at a 'real' data mining job.

But here's the issue: what do you do with the two columns? You can't use file_read_number, because the file doesn't contain straight numbers. It's a CSV file in the format of year,concentration. Luckily, we have another function called file_read that just reads in a line from a file into a string. It doesn't try to process it into a number. It just reads in what it sees, and gives it to you.

We'll show you how to work with a string of numbers in the next lesson. At this point, just see if you can display the data in co2.csv to the screen.
data=file_read(handle)
Move the mouse over a dotted box for more information.

Now you try. Fix the data= line and see if you can print the contents of co2.csv to the screen.

Type your code here:


See your results here: