Python for Energy Modelers – Part 3 – Simple Post-processing

Last time we looked at scripting the creation of many input files for a parametric study of a building energy concept. Now we will turn to the post-processing side of the energy modeling workflow. Often, a spreadsheet tool like Excel is a first choice for many analysis tasks. This is great for simple cases, but if the number of files or the amount of data is large or complex, Excel will cost you time and lead to errors. This is where you should turn to Python!

Let’s look a concrete example, from a project I worked on a few months ago. What we needed to do was generate building load profiles for 3 stock building geometries. We were investigating 8 different internal loads (office, residential, etc.), 5 different insulation types, and various other parameters. After an input file generation script similar to last week’s, I had 384 TRNSYS input files! These were executed all over night, resulting in 384 output files. And extract from one of these output files is below:

TIME SQHEAT SQCOOL
+0.0000000000000000E+00 +0.0000000000000000E+00 +0.0000000000000000E+00
... MORE ...
+5.7000000000000000E+01 +1.3237372413443035E+05 +0.0000000000000000E+00
+5.8000000000000000E+01 +7.7162712393718830E+04 +0.0000000000000000E+00
+5.9000000000000000E+01 +6.5336426881896477E+04 +0.0000000000000000E+00
+6.0000000000000000E+01 +6.7508927183426335E+04 +3.5916664027267243E+03
+6.1000000000000000E+01 +6.3983667187962485E+04 +4.5933382762949959E+03
+6.2000000000000000E+01 +4.7441687548894530E+04 +1.5935811219563341E+03
+6.3000000000000000E+01 +3.3644472803687437E+04 +2.9517575364511831E+03
+6.4000000000000000E+01 +4.2939548284044671E+04 +0.0000000000000000E+00
+6.5000000000000000E+01 +5.1990203122431951E+04 +0.0000000000000000E+00
+6.6000000000000000E+01 +5.9902003142972004E+04 +0.0000000000000000E+00
... MORE ...
+8.7600000000000000E+03 +1.8288512650404076E+05 +0.0000000000000000E+00

This file is a typical output file from TRNSYS, a tab-seperated ASCII text file with a single header line. What we have in each row is the time stamp (8670 hours of the year), and then the average heating and cooling load for that hour (kJ/hr). But this format is very common for many other simulation tools, except the delimiter might be a comma, or the header lines or formatting might be different.

From these 384 output files, we might ask; What building instance requires the most heating or cooling over the year? What is the correlation between average envelope U-Value and the system loads? And many other questions required some statistical analysis. Generally, our problem statement is; How do we statistically analyze and summarize a large number of text delimited simulation output files? Fortunately, Python has some core modules specifically designed to read and process these files.

Once again, let’s write out some pseudo-code;

# Specify directory and file extension
# Collect the file names of interest from the directory
# For each file found, process each in turn
# Open it
# Read it as a comma separated style format
# For each row of the file
# Access the 2nd and 3rd column
# Convert [kJ/hr] into [Watts]
# Save the data to our list
# Calculate the cooling and heating load in [Wh]
# Calculate the average load in [W]
# Close the file
# Write results for the processed file

We have a similar block structure as last time; a problem parameter section (line 1 and 3), and then a looping structure. This looping is exactly where Python does the repetitive boring tasks to save you your sanity!

Looking a bit closer at lines 4 - 15, we can see that there are two loops going: first, we loop over each file that was discovered in our project directory. In this case, 384 ASCII text files with the extension “.out”. Note also that the “.out” files have one line of header data, this header will be skipped. The file is further scanned and the data is extracted. Once these data are saved, we can perform any conversion and statistics we need.

Let’s start by loading in some modules, and specifying parameters;

import os
import csv

# Specify directory and file extension
searchDirectory = os.path.normpath(r"C:\Project5\Output")
searchEnding = "out"

First, I am _import_ing some “helper” functions and objects to make this script easier. Specifically, I am using the “Operating System” module, which makes it easy to access both Windows and Linux paths.

Note also the “" double back-slash, and the r”” string. In Python, the “” character is a special escape character, used to represent other ASCII characters like tab (t), carriage return (n), nil (), etc. The r in r”string” tells python that this string doesn’t have any escape characters. So I could also use single slash. But as a reminder, I make it a habit of using “" regardless. Finally, I am using the os.normpath() function to make 100% sure my path is in a proper format. This is redundant, I do this because I have too often made the mistake of improper path names!

Finally, we want to ignore anything in the search directory that doesn’t have the “.out” extension”.

Next, I want to scan this directory and store any “.out” file paths;

# Collect the file names of interest from the directory
inputFilePaths = list()
for filename in os.listdir(searchDirectory):
if filename.endswith(searchEnding):
# Found a file with proper extension
fullFilePath = os.path.join(searchDirectory, filename)
# Add it to our running list
inputFilePaths.append(fullFilePath)

# Some feedback on what was found
print "Found {0} '{1}' files in {2}. ".format(
len(inputFilePaths), searchEnding, searchDirectory
)

First, I create a blank list in order to store all the paths I find. Try this code without first declaring your list! Python is normally very forgiving about declaring and changing variables later in your code (this is called “dynamic typing”, and is one reason why Python is so flexible and “easy” to program), but in this case we are adding subsequent elements to a list. Matlab programmers will find this behavior familiar, and the discussion of “pre-allocating” arrays arises.

We take advantage of the os.listdir() method, which, yes indeed, lists files in a directory. From this list of filenames found, we loop over each one and check if it endswith() our extension “.out”. If we do find a match, we store this full path in our list. Since listdir() only gives us the file name and not the parent directory, we need to join() them together.

Finally, let’s provide some feedback for this section (line 17) and show how many files were found. Note the syntax of the print "{}".format() statement. This is an alternative way to print and format strings with variables included, and is preferred for flexibility and control. You can also just list strings and variables together in a print statement; but this can get awkward…

# For each file found, process each in turn
for filePath in inputFilePaths:
# Open it
openedFile = open(filePath)
# Read it as a something separated style format (t is TAB)
openedCSV = csv.reader(openedFile, delimiter="t")
# ! Skip the header line !
openedCSV.next()
# Blank list for storing data
coolingLoad = list()
# For each row of the file
for row in openedCSV:
# For each row, access the 2nd column
# Convert [kJ/hr] into [Watts]
thisHoursCoolingLoad = float(row[2]) / 3.6
# Save the data to our list
coolingLoad.append(thisHoursCoolingLoad)
# Calculate the cooling and heating load in [Wh]
totalCoolingLoad = sum(coolingLoad)
# Close the file
openedFile.close()
print totalCoolingLoad

Here we can see this same looping structure, first over each file, and then over each line of each file. The first part is to open the file (line 21) and use the “csv” module to provide some additionally functionality - the capability to read in a file line by line, with a certain delimiter (TAB in our case), and to return a list of items found.

Next, the file is scanned row by row, returning the list of items. In our case, we have 3 items; TIME, SQHEAT, and SQCOOL. For the sake of brevity, I am only accessing the cooling load in this example. This is indexed in the list by [2]. Python indexing starts at 0, therefore [2] is the third list element! Also in line 32, I first convert the +1.32+05 style strings into python floating point, and then converting this number from [kJ/hr], which is a native TRNSYS unit, into more comfortable [W]. This number is then stored to a running tally of cooling loads (in other words, the entire 3rd column of the data table.

Finally, for each file, the column list is summed, and printed.

Bonus Exercises!

  1. How would you modify this script to save the summary results into a new file?
  2. Another common engineering question; at what time is the peak cooling or heating load? How could you find this peak, and then the corresponding time from the fist[0] column?
  3. An amazing module to check out; NumPy , one of the most exciting modules for us as energy modelers or engineers. This module, and others matplotlib, SciPy), promise a free open source alternative to i.e. Matlab, but with the full power of Python as a bonus. What elements of NumPy would make this script more general, clear, or useful?

Now we have looked at both sides of the energy modeling work flow, pre-processing and post-processing. Hopefully, you can see how powerful this combination of Python+Simulation is! Next time, we will review the overall process of scripting, best practices, and issues like compatibility and installation!

Cheers from Vienna,

Marcus