I’ve got a lot of ground to cover, picking up where I left off several months ago. In earlier posts I presented the concept for creating reports from sqlite databases using Python, Jinja, and LaTeX, and looked at different methods for passing data from the database to the template. I’m using the NYC Geodatabase as our test case. In this entry I’ll cover how I implemented my preferred approach – creating Python dictionaries to pass to the Jinja template.
One of the primary decisions I had to make was how to loop through the database. Since the reports we’re making are profiles (lots of different data for one geographic area), we’re going to want to loop through the database by geography. So, for each geography select all the data from a specific table, pass the data out to the template where the pertinent variables are pulled, build the report and move on to the next geography. In contrast, if we were building comparison tables (one specific variable for many geographic areas) we would want to loop through the data by variable.
In the beginning of the script we import the necessary modules, set up the Jinja environment, and specify our template (not going to repeat that code here – see the previous post). Then we have our function that creates a dictionary for a specific data table for a specific geography:
def pulltab(tabname,idcol,geog): query='SELECT * FROM %s WHERE %s = %s' %(tabname,idcol,geog) curs.execute(query) col_names = [cn[0] for cn in curs.description] rows = curs.fetchall() for row in rows: thedict=dict(zip(col_names,row)) return thedict
We connect to the database and create a dictionary of all the geographies (limited to 3 PUMAs since this is just a test):
#Connect to database and create dictionary of all geographies conn = sqlite3.connect('nyc_gdb_jan2015a/nyc_gdb_jan2015.sqlite') curs = conn.cursor() curs.execute('SELECT geoid10, namelsad10 FROM a_pumas2010 ORDER BY geoid10 LIMIT 3') rows = curs.fetchall() geodict=dict(rows)
And then we generate reports by looping through all the geographies in that dictionary, and we pass in the ID of each geography to pull all data from a data table for that geography out of the table and into a dictionary.
#Generate reports by looping through geographies and passing out #dictionaries of values for geog in geodict.keys(): acs1dict=pulltab('b_pumas_2013acs1','GEOID2',geog) acs2dict=pulltab('b_pumas_2013acs2','GEOID2',geog) name=geodict.get(geog) filename='zzpuma_' + geog + '.tex' folder='test5' outpath=os.path.join(folder,filename)
Lastly, we pass the dictionaries out to the template, and run LaTeX to generate the report from the template:
outfile=open(outpath,'w') outfile.write(template.render(geoid=geog, geoname=name, acs1=acs1dict, acs2=acs2dict)) outfile.close() os.system("pdflatex -output-directory=" + folder + " " + outpath) conn.close()
The Jinja template (as a LaTeX file) is below – the example here is similar to what I covered in my previous post. We passed two dictionaries into the template, one for each data table. The key is the name of the variable (the column name in the table) and the value is the American Community Survey estimate and the margin of error. We pass in the key and get the value in return. The PDF output follows.
\documentclass{article} \usepackage[margin=0.5in]{geometry} \usepackage{graphicx} \usepackage[labelformat=empty]{caption} \usepackage[group-separator={,}]{siunitx} \title{\VAR{acs1.get('GEOLABEL') | replace("&","\&")} \VAR{acs1.get('GEOID2')}} \date{} \begin{document} \maketitle \pagestyle{empty} \thispagestyle{empty} \begin{table}[h] \centering \caption{Commuting to Work - Workers 16 years and over} \begin{tabular}{|c|c|c|c|c|} \hline & Estimate & Margin of Error & Percent Total & Margin of Error\\ \hline Car, truck, or van alone & \num{\VAR{acs1.get('COM02_E')}} & +/- \num{\VAR{acs1.get('COM02_M')}} & \num{\VAR{acs1.get('COM02_PC')}} & +/- \num{\VAR{acs1.get('COM02_PM')}}\\ Car, truck, or van carpooled & \num{\VAR{acs1.get('COM03_E')}} & +/- \num{\VAR{acs1.get('COM03_M')}} & \num{\VAR{acs1.get('COM03_PC')}} & +/- \num{\VAR{acs1.get('COM03_PM')}}\\ Public transit & \num{\VAR{acs1.get('COM04_E')}} & +/- \num{\VAR{acs1.get('COM04_M')}} & \num{\VAR{acs1.get('COM04_PC')}} & +/- \num{\VAR{acs1.get('COM04_PM')}}\\ Walked & \num{\VAR{acs1.get('COM05_E')}} & +/- \num{\VAR{acs1.get('COM05_M')}} & \num{\VAR{acs1.get('COM05_PC')}} & +/- \num{\VAR{acs1.get('COM05_PM')}}\\ Other means & \num{\VAR{acs1.get('COM06_E')}} & +/- \num{\VAR{acs1.get('COM06_M')}} & \num{\VAR{acs1.get('COM06_PC')}} & +/- \num{\VAR{acs1.get('COM06_PM')}}\\ Worked at home & \num{\VAR{acs1.get('COM07_E')}} & +/- \num{\VAR{acs1.get('COM07_M')}} & \num{\VAR{acs1.get('COM07_PC')}} & +/- \num{\VAR{acs1.get('COM07_PM')}}\\ \hline \end{tabular} \end{table} \begin{table}[h] \centering \caption{Housing Tenure} \begin{tabular}{|c|c|c|c|c|} \hline & Estimate & Margin of Error & Percent Total & Margin of Error\\ \hline Occupied housing units & \num{\VAR{acs2.get('HTEN01_E')}} & +/- \num{\VAR{acs2.get('HTEN01_M')}} & &\\ Owner-occupied & \num{\VAR{acs2.get('HTEN02_E')}} & +/- \num{\VAR{acs2.get('HTEN02_M')}} & \num{\VAR{acs2.get('HTEN02_PC')}} & +/- \num{\VAR{acs2.get('HTEN02_PM')}}\\ Renter-occupied & \num{\VAR{acs2.get('HTEN03_E')}} & +/- \num{\VAR{acs2.get('HTEN03_M')}} & \num{\VAR{acs2.get('HTEN03_PC')}} & +/- \num{\VAR{acs2.get('HTEN03_PM')}}\\ \hline \end{tabular} \end{table} \end{document}
In this example we took the simple approach of grabbing all the variables that were in a particular table, and then we just selected what we wanted within the template. This is fine since we’re only dealing with 55 PUMAs and a table that has 200 columns or so. If we were dealing with gigantic tables or tons of geographies, we could modify the Python script to pull just the variables we wanted to speed up the process; my inclination would be to create a list of variables in a text file, read that list into the script and modify the SQL function to just select those variables.
What if we want to modify some of the variables before we pass them into the template? I’ll cover that in the next post.