Friday, May 03, 2019

Creating Excel spreadsheet with IBM i library report using Python

A few days ago Mike Larsen published a great article titled "Creating Excel Spreadsheets With Python", in which he described how easy it is to populate Excel spreadsheets with data from IBM i. Mike demonstrated several features like formatting, tab colors and conditional formatting in his example and showed how few Python lines of code are needed in order to get a good-looking result.

His article reminded me of another Python script, that I wrote last year, using the very same technique. My Python script is called "lib_report.py" and creates an Excel spreadsheet containing extracted information about objects in a library, including

  • Object count
  • Object size total
  • Object size total by age
  • Object name, type, description, size and last used date

One of the most useful features of the spreadsheet generated is the filter on the detail information, where the user can filter and sort the information. This Python scripts demostrates some very useful Excel features not shown in Mikes article, so I publish my script to complement the article by Mike.

The source for the Python script can be found here and is a mere 100 lines - including comments! The Python modules used are again 'xlsxwriter' and 'ibm_db_dbi' in addition to the 'sys' module. I will go through the script and explain what the code does and how the Excel features are implemented:

#!/usr/bin/env python3

This line is a so-called "shebang", telling the shell how to run the script. I won't go into details (and you can read more about shebangs here), but it is a nice way to make text files into executables.

import xlsxwriter
import ibm_db_dbi as dbi
import sys

Here we import the modules used in the Python script.

lib = sys.argv[1].upper()

A Python variable called "lib" is set to the value (in uppercase) of the first argument to the script. You specify what library to examine by calling the script and adding the name of the library as a parameter, e.g. "lib_report.py qgpl".

output_filename = 'lib_report_' + lib + '.xlsx'

Here we create the filename of the Excel spreadsheet. The filename will be on the form "lib_report_<library name>.xlsx".

conn = dbi.connect(dsn=None, database='*LOCAL', user=None, password=None)
cur = conn.cursor()

These lines connect to the database and create a cursor variable.

sql_allobj = """\
select objname, objtype, objtext, objsize,
       coalesce( date( last_used_timestamp ), date( objcreated ) ) as objdate
  from table( qsys2.object_statistics( ?, '*ALL' ) ) os
 order by objsize desc
"""

Here we define the SQL statement, that will extract the object information from the library. Note that we use the SQL Service OBJECT_STATISTICS to do the job, and the statement will receive the library name as a parameter - the "?" in the statement text.

print('Getting object statistics for ' + lib + '.')

sql_parms = [ lib ]
cur.execute(sql_allobj, sql_parms)
headers = [descr[0] for descr in cur.description]

We print a status information on the console, set the parameters for the SQL statements and executes the statement. The column headers from the result set are extracted so we can write them to the Excel sheet later.

print('Creating Excel file with object statistics.')

workbook = xlsxwriter.Workbook(output_filename)
worksheet_summary = workbook.add_worksheet('Summary')
worksheet_details = workbook.add_worksheet('Detail')

Again we print status to the console before creating the Excel file. Then we add two sheets to the spreadsheet - one for the summary and another for the detailed information.

# Add a bold format for the headers.
bold = workbook.add_format({'bold': True})
bold_right = workbook.add_format({'bold': True, 'align': 'right'})

Some formats are created to be used with the headers, like bold and right-justifying. You can't add formatting directly to one or more cells, you have to create a format object first in Excel and then apply the format object to the cell(s). This can be very cumbersome because you can only specify one format object to a cell and thus have to create a format object for each distinct format used in the Excel spreadsheet.

# Make the columns wider.
worksheet_details.set_column('A:A', 14)
worksheet_details.set_column('B:B', 10)
worksheet_details.set_column('C:C', 48)
# Make the header rows larger and bold.
worksheet_details.set_row(0, 20, bold)
worksheet_details.set_row(1, 20, bold)
worksheet_details.set_row(2, 20, bold)
# Write the headers.
worksheet_details.write_row('A3', headers)

We set the column width and row height of the header rows on the detail tab, and then we write the headers which we extracted from the result set earlier.

size_format = workbook.add_format({'num_format': 0x03})
size_bold_format = workbook.add_format({'num_format': 0x03, 'bold': True})
objdate_format = workbook.add_format({'num_format': 'yyyy-mm-dd'})

# Display size as formatted integer.
worksheet_details.set_column('D:D', 16, size_format)
# Display used as formatted date.
worksheet_details.set_column('E:E', 14, objdate_format)

We create some formats for the object size and date and apply them to the columns D and E.

row = 2
for row_data in cur:
# Move on to the next worksheet row.
    row += 1
    col = 0
    for col_data in row_data:
        worksheet_details.write(row, col, col_data)
    # Move on to the next worksheet column.
    col += 1

object_count = row - 2

Now we populate the rows in the detail sheet. First we initiate the row counter to 2 (for row number 3 - Excel rows and columns are indexed starting with zero). Then we loop through the rows in the result set, increment the row counter, initiate the column counter and loop through the columns in the row, writing each one to a cell. After the loops are completed we retrieve the count of objects.

workbook.define_name('Detail!Objsizes', '=Detail!$D$4:$D$' + str(object_count + 3))
workbook.define_name('Detail!Objdates', '=Detail!$E$4:$E$' + str(object_count + 3))

Here we define two cell ranges in the Excel sheet. One cell range is called 'Detail!Objsizes' and covers all the values that we have written to column D. The same goes for the second cell range 'Detail!Objdates', which covers the values in column E.

worksheet_details.write(0, 0, 'Object count', bold)
worksheet_details.write(0, 1, '=SUBTOTAL(102, Detail!Objsizes)', size_bold_format)
worksheet_details.write(0, 2, 'Object size total', bold_right)
worksheet_details.write(0, 3, '=SUBTOTAL(109, Detail!Objsizes)', size_bold_format)

Now we write some text and formulas to the first row in the detail sheet. The formulas use the Excel function SUBTOTAL, which can calculate statistical values like count and sum according to the first parameter value sent to the function. Value "102" is a count and "109" is a sum. Both values make the SUBTOTAL function exclude any hidden cells in the result - this allows the user to hide one or more rows in the sheet and calculate the result without these rows. A very dynamic function indeed!

# Set autofilter on.
worksheet_details.autofilter('A3:E' + str(object_count + 3))

As the comment says, we set filtering on. This instruction enables autofiltering in the detail sheet on the values in the cell range from column A to E and row 3 to the count of objects plus 3, i.e. all the object detail information. With filtering the user can easily select or omit certain values in the data, e.g. one or more years in the object date values. Or sort the rows by one the columns in ascending or descending order. Again a very useful feature if the user wants to examine the object statistics further.

worksheet_summary.set_column('A:A', 16, bold)
worksheet_summary.set_column('B:E', 18, size_format)
worksheet_summary.write_row(0, 0, ('Library name', lib))
worksheet_summary.write_row(1, 0, ('Object count', '=COUNT(Detail!Objsizes)'))
worksheet_summary.write_row(2, 0, ('Object size total', '=SUM(Detail!Objsizes)'))
worksheet_summary.write(22'=CONCATENATE(TEXT(SUM(Detail!Objsizes)/1024/1024/1024, "#.##0"), " GB")', bold_right)

Now we populate the summary sheet. First we set the column sizes and format, and then we write some rows with the library name, object count, object size total and object size total in gigabytes. The object count and size total are formulas that again references the detail cell ranges that we defined above.

worksheet_summary.write_row(3, 0, ('Object age', '0-3 months', '3-6 months', '6-12 months', '>12 months'), bold_right)
worksheet_summary.write(4, 1, '=SUMIFS(Detail!Objsizes, Detail!Objdates, ">=" & EDATE(TODAY(), -3))', size_format)
worksheet_summary.write(4, 2, '=SUMIFS(Detail!Objsizes, Detail!Objdates, "<" & EDATE(TODAY(), -3), Detail!Objdates, ">=" & EDATE(TODAY(), -6))', size_format)
worksheet_summary.write(4, 3, '=SUMIFS(Detail!Objsizes, Detail!Objdates, "<" & EDATE(TODAY(), -6), Detail!Objdates, ">=" & EDATE(TODAY(), -12))', size_format)
worksheet_summary.write(4, 4, '=SUMIFS(Detail!Objsizes, Detail!Objdates, "<" & EDATE(TODAY(), -12))', size_format)

Next in the summary sheet we write rows that shows the sum of the object sizes in relation to the object age. This involves using the Excel function SUMIFS, that can summarize conditionally. We write four formulas: the size of objects created or used in the last 3 months, from 3 to 6 months, from 6 to 12 months and finally older than 12 months.

workbook.close()

Finally we close the Excel workbook. Until now the Excel workbook has been in memory, and to make it into a file we have to close the workbook.

...and we're done! We now have an Excel spreadsheet file containing summary and detailed information about the objects in a library, and the Excel user can filter and sort the data for further analysis.

The generated Excel files could use some polishing and brushing-up, but I will leave that up to you. Feel free to use this Python script as you like, maybe as a basis for further exploration of how to generate advanced Excel spreadsheets on IBM i. The one thing to remember is that you have all the components necessary to do this without any cost - you don't need to buy any additional software. Just install the Python RPM if you have not already done so.

IBM has done a great job adding open source technology to the platform, and Python is one of the best tools in many situations. This is one example of such a situation - in 100 lines of Python we have generated an advanced Excel spreadsheet file. Later I will show you another example where Python come to the rescue - removing rows from a comma-separated file based on the value of a column. Coming soon...

No comments: