Keeping data and code together with org-mode
With org-mode you can keep data, code, and documentation in one file.
Suppose you have an org-mode file containing the following table.
#+NAME: mydata | Drug | Patients | |------+----------| | X | 232 | | Y | 351 | | Z | 117 |
Note that there cannot be a blank line between the NAME header and the beginning of the table.
You can bring this table into Python simply by declaring it to be a variable in the header of a Python code block.
#+begin_src python :var tbl=mydata :results output print(tbl) #+end_src
When you evaluate this block, you see that the table is imported as a list of lists.
[['X', 232], ['Y', 351], ['Z', 117]]
Note that the column headings were not imported into Python. Now suppose you would like to retain the headers, and use them as column names in a pandas data frame.
#+begin_src python :var tbl=mydata :colnames no :results output import pandas as pd df = pd.DataFrame(tbl[1:], columns=tbl[0]) print(df, "\n") print(df["Patients"].mean()) #+end_src
When evaluated, this block produces the following.
Drug Patients 0 X 232 1 Y 351 2 Z 117 233.33333333333334
Note that in order to import the column names, we told org-mode that there are no column names! We did this with the header option
:colnames no
This seems backward, but it makes sense. It says do bring in the first row of the table, even though it appears to be a column header that isn't imported by default. But then we tell pandas that we want to make a data frame out of all but the first row (i.e. tbl[1:]) and we want to use the first row (i.e. tbl[0]) as the column names.
A possible disadvantage to keeping data and code together is that the data could be large. But since org files are naturally in outline mode, you could collapse the part of the outline containing the data so that you don't have to look at it unless you need to.
Related postsKeeping data and code together with org-mode first appeared on John D. Cook.