Too Many Files? Feed Them to Python’s Voracious Glob Module

In my previous post I mentioned that a coworker had recently emailed me a folder full of over two hundred Excel files and asked me to extract some relevant data from each file. I noted how undertaking that task manually would have been time-consuming and error-prone and described how exhilarating it was to accomplish the task quickly by writing some Python code.

I didn’t show how to process multiple files with Python in that post because it is easier to understand the code for processing multiple files once you’re familiar with code for processing one file. So that’s why in that post I demonstrated how to read and write a single CSV file with Python. With that knowledge under our belts, we’re now prepared to understand Python code for processing multiple CSV files.

One good way to learn to code in Python is to create small datasets on your laptop and then write a Python script to process or manipulate them in some way, so that’s what we’ll do here. The following example demonstrates one way to read multiple CSV files (with similarly formatted data), concatenate the data from the files, and write the results to an output file.

One assumption I make in this example is that you’ve already visited http://www.python.org/ and downloaded and installed the version of Python that is compatible with your computer’s operating system. Another assumption is that all of the input files are located in the same folder. Also, unlike in my previous post, the script in this example can handle commas embedded in column values because it imports Python’s built-in csv module, which makes it easier to handle numbers with embedded commas, e.g. $1,563.25.

Ok, in order to process multiple CSV files we need to create multiple CSV files. Open Microsoft Excel and add the following data:

Figure I: 1st Input File - sales_january2014.csv

Figure I: 1st Input File – sales_january2014.csv

Now open the ‘Save As’ dialog box. In the location box, navigate to your Desktop so the file will be saved on your Desktop. In the format box, select ‘Comma Separated Values (.csv)’ so that Excel saves the file in the CSV format. Finally, in the ‘Save As’ or ‘File Name’ box, type “sales_january2014”. Click ‘Save’.

Ok, that’s one input file. Now let’s create a second input file. Open a new Excel workbook and add the following data:

Figure II: 2nd Input File - sales_february2014.csv

Figure II: 2nd Input File – sales_february2014.csv

Now open the ‘Save As’ dialog box. In the location box, navigate to your Desktop so the file will be saved on your Desktop. In the format box, select ‘Comma Separated Values (.csv)’ so that Excel saves the file in the CSV format. Finally, in the ‘Save As’ or ‘File Name’ box, type “sales_february2014”. Click ‘Save’. Ok, now we have two CSV input files, one for January and one for February. We’ll stick with two input files in this example to keep it simple, but please keep in mind that the code in this example can handle many more files; that is, it will scale well.

Now that we have two CSV files to work with, let’s create a Python script to read the files and write their contents to an output file. Open your favorite text editor (e.g. Notepad) and add the following lines of code:

#!/usr/bin/python
import csv
import glob
import os
import sys

input_path = sys.argv[1]
output_file = sys.argv[2]

filewriter = csv.writer(open(output_file,’wb’))
file_counter = 0
for input_file in glob.glob(os.path.join(input_path,’*.csv’)):
        with open(input_file,’rU’) as csv_file:
                filereader = csv.reader(csv_file)
                if file_counter < 1:
                        for row in filereader:
                                filewriter.writerow(row)
                else:
                        header = next(filereader,None)
                        for row in filereader:
                                filewriter.writerow(row)
        file_counter += 1

The first line is a comment line that makes the script transferable across operating systems. The next four lines import additional built-in Python modules so that we can use their methods and functions. You can read more about these and other built-in modules at: http://docs.python.org/2/library/index.html.

The sixth line uses argv from the sys module to grab the first piece of information after the script name on the command line, the path to and name of the input folder, and assigns it to the variable input_path. Similarly, the seventh line grabs the second piece of information after the script name, the path to and name of the output file, and assigns it to the variable output_file.

The eighth line uses the csv module to open the output file in write ‘w’ mode and create a writer object, filewriter, for writing to the output file. The ‘b’ enables a distinction between binary and text files for systems that differentiate between binary and text files, but for systems that do not, the ‘b’ has no effect. The ninth line creates a variable, file_counter, to store the count of the number of files processed and initializes it to zero.

The tenth line creates a list of the input files to be processed and also starts a “for” loop for looping through each of the input files. There is a lot going on in this one line, so let’s talk about how it works. os.path.join joins the two components between its parentheses. input_path is the path to the folder that contains the input files and ‘*.csv’ represents any file name that ends in ‘.csv’.

glob.glob expands the asterisk ‘*’, a Unix Shell wildcard character, in ‘*.csv’ into the actual file name. Together, glob.glob and os.path.join create a list of our two input files, e.g. ['C:\Users\Clinton\Desktop\sales_january2014.csv', 'C:\Users\Clinton\Desktop\sales_february2014.csv']. Finally, the “for” loop syntax executes the lines of code beneath this line for each of the input files in this list.

The eleventh line uses a “with” statement to open each input file in read ‘r’ mode. The ‘U’ mode helps recognize newlines in case your version of Python is built without universal newlines. The twelfth line uses the csv module to create a reader object, filereader, for reading each input file.

The thirteenth line creates an “if-else” statement for distinguishing between the first input file and all subsequent input files. The first time through the “for” loop file_counter equals zero, which is less than one, so the “if” block is executed. The code in the “if” block writes every row of data in the first input file, including the header row, to the output file.

At the bottom of the “for” loop, after processing the first input file, we add one to file_counter. Therefore, the second time through the “for” loop file_counter is not less than one, so the “else” block is executed. The code in the “else” block uses the csv module’s next() method to read the first row, i.e. the header row, of the second and subsequent input files into the variable, header, so that it is not written to the output file. The remaining code in the “else” block writes the remaining rows in the input file, the rows of data beneath the header row, to the output file.

Now that we understand what the code is supposed to do, let’s save this file as a Python script and use it to process our two input files. To save the file as a Python script, open the ‘Save As’ dialog box. In the location box, navigate to your Desktop so the file will be saved on your Desktop. In the format box, select ‘All Files’ so that the dialog box doesn’t select a specific file type. Finally, in the ‘Save As’ or ‘File Name’ box, type process_many_csv_files.py. Click ‘Save’. Now you have a Python script you can use to process multiple CSV files.

Figure III: Python Script - process_many_csv_files.py

Figure III: Python Script – process_many_csv_files.py

To use process_many_csv_files.py to read and write the contents of our two input files, open a Command Prompt (Windows) or Terminal (Mac) window. When the window opens the prompt will be in a particular folder, also known as a directory (e.g. “C:\Users\Clinton\Documents”). The next step is to navigate to the Desktop, where we saved the Python script.

To move between folders, you can use the ‘cd’ command, a Unix command which stands for change directory. To move up and out of the ‘Documents’ folder into the ‘Clinton’ folder, type the following and then hit Enter:

cd ..

That is, the letters ‘cd’ together followed by one space followed by two periods. The two periods ‘..’ stand for up one level. At this point, the prompt should look like “C:\Users\Clinton”. Now, to move down into a specific folder you use the same ‘cd’ command followed by the name of the folder you want to move into. Since the ‘Desktop’ folder resides in the ‘Clinton’ folder, you can move down into the ‘Desktop’ folder by typing the following and then hitting Enter:

cd Desktop

At this point, the prompt should look like “C:\Users\Clinton\Desktop” in the Command Prompt and we are exactly where we need to be since this is where we saved the Python script and two CSV input files. The next step is to run the Python script.

To run the Python script, type one of the following commands on the command line, depending on your operating system, and then hit Enter:

Windows:
python process_many_csv_files.py . sales_summary.csv

That is, type python, followed by a single space, followed by process_many_csv_files.py, followed by a single space, followed by a single period, followed by a single space, followed by sales_summary.csv, and then hit Enter. The single period refers to the current directory, your Desktop (i.e. the folder that contains your two input files).

Mac:
chmod +x process_many_csv_files.py
./process_many_csv_files.py . sales_summary.csv

That is, type chmod, followed by a single space, followed by +x, followed by a single space, followed by process_many_csv_files.py, and then hit Enter. This command makes the Python script executable. Then type ./process_many_csv_files.py, followed by a single space, followed by a single period, followed by a single space, followed by sales_summary.csv, and then hit Enter:

After you hit Enter, you should not see any new output in the Command Prompt or Terminal window. However, if you minimize all of your open windows and look at your Desktop there should be a new CSV file called sales_summary.csv. Open the file. The contents should look like:

Figure IV: Output File - sales_summary.csv

Figure IV: Output File – sales_summary.csv

As you can see, a single header row and the six rows of data from the two input files were successfully written to the output file, sales_summary.csv. Often, this procedure of concatenating multiple input files into a single output file is all you need to do to begin your analysis. However, sometimes you may not need all of the rows or columns in the output file. Or you may need to modify the data or perform a calculation before writing it to the output file. In many cases, you would only need to make slight modifications to the code discussed above to alter the data written to your output file.

In this example there were only two input files, but the code generalizes to basically as many input files as your computer can handle. So if you need to concatenate the data in a few dozen, hundred, or thousand CSV files the code is basically re-useable as-is. This ability to automate and scale repetitive procedures on files and data is one of the great advantages of learning to code (even a little bit).

Now you know that it only takes a few lines of code to automate a process that, given a larger number of files, would be time-consuming and error-prone or even infeasible. Being able to quickly accomplish a task that would be difficult or impossible to do manually is empowering. Add in the benefit of eliminating manual “copy/paste” errors (once you debug your code), and the new capability is really exciting. Having read this post, I hope you’re now more familiar with Python and eager to begin using it. If you have any questions, please reply to this post.

The Two Rs of Python: Reading and Writing CSV Files

A coworker recently emailed me a folder full of over two hundred Excel files and asked me to extract some relevant data from each file.  A few years ago that task would have been daunting.  Like many other business people, I would have had to manually open each of the files and extract (copy and paste) the relevant data into a new output file.  With so many files, the undertaking would have required a lot of time and been prone to errors, even with careful attention to detail.

Thankfully, I had already learned how to code in Python, an easy-to-learn programming language that is great for processing files and manipulating data.  With Python, I was able to write a short script that read each of the files, extracted the relevant data from each file, and wrote that data to an output file.  The whole process took a couple of hours, compared to the days or weeks it would have taken to do it manually.

This type of capability, reducing the time it takes to complete a task from weeks to hours, is exhilarating.   That’s one of the reasons why I’m going to start including “analytics” topics like this one in this blog.  Another benefit of learning to do your work with Python is that the method scales.  As I described in my story, as the number of files grows it simply becomes more and more difficult to complete the work in a timely fashion.  Once you learn how to use Python to process and manipulate files and data, you won’t want to return to tiresome manual processes.

One of the ways I learned to code in Python was to create small datasets in a folder on my laptop and then write a Python script to process or manipulate them in some way.  Since I found this method to be really helpful, that’s how I’m going to present the example in this blog post.  Processing multiple files isn’t that complicated once you understand how all of the lines of code work together, but it does involve several concepts that are easier to understand once you know how to process a single file, so that’s where we’ll start.

The following example demonstrates one way to read a single comma separated values (CSV) file and write its contents to an output file.  One assumption I make in this example is that you’ve already visited http://www.python.org/ and downloaded and installed the version of Python that is compatible with your computer’s operating system.  Another assumption is that none of the values in the spreadsheet contains commas.  Ok, in order to work with a CSV file we need to create a CSV file.  Open Microsoft Excel and add the following data:

Figure I: Input File - customers.csv

Figure I: Input File – customers.csv

Now open the ‘Save As’ dialog box.  In the location box, navigate to your Desktop so the file will be saved on your Desktop.  In the format box, select ‘Comma Separated Values (.csv)’ so that Excel saves the file in the CSV format.  Finally, in the ‘Save As’ or ‘File Name’ box, type “customers”.  Click ‘Save’.  That’s all there is to it.  Now you have a CSV file called customers.csv that you can read into Python.

Now that we have a CSV file to work with, let’s create a Python script to read the file and write its contents to an output file.  Open your favorite text editor (e.g. Notepad) and add the following lines of code:

#!/usr/bin/python
import string
import sys

input_file = sys.argv[1]
output_file = sys.argv[2]

filewriter = open(output_file, ‘wb’)

with open(input_file, ‘rU’) as filereader:
        for row in filereader:
                row = row.strip()
                row_list = row.split(‘,’)
                filewriter.write(‘,’.join(map(str,row_list))+’\n’)
filewriter.close()

The first line is a comment line that makes the script transferable across operating systems.  The second line imports Python’s built-in string module so that we can use its methods and functions, e.g. strip() and split(), to manipulate strings.  The third line imports Python’s built-in sys module so that we can supply additional information to the script on the command line in either the Command Prompt (Windows) or Terminal (Mac) window.

The fourth line uses argv from the sys module to grab the first piece of information after the script name on the command line, the path to and name of the input file, and assigns it to the variable input_file.  Similarly, the fifth line grabs the second piece of information after the script name, the path to and name of the output file, and assigns it to the variable output_file.

The sixth line opens the output file in write ‘w’ mode and creates a writer object, filewriter, for writing to the output file.  The ‘b’ enables a distinction between binary and text files for systems that differentiate between binary and text files, but for systems that do not, the ‘b’ has no effect.

The seventh line uses a “with” statement to open the input file in read ‘r’ mode and creates a reader object, filereader, for reading the input file.  The ‘U’ mode helps recognize newlines in case your version of Python is built without universal newlines.

The eighth line uses a “for” loop to loop through the rows in the input file one by one.  The next three lines of code occur for each row in the input file since they’re contained in the body of the “for” loop (you can tell they’re in the body of the loop because they’re indented).  Each row of data in the input file enters the “for” loop as a string that contains column values separated by commas.

The ninth line uses the string module’s strip() function to remove whitespace, tabs, and newline characters from both ends of the string and then assigns that stripped version of the string to the variable row.  The tenth line uses the string module’s split() function to split the string on the embedded commas into a list of column values and then assigns the list to the variable row_list.

The eleventh line uses the map() and str() functions to convert each of the values in row_list to a string.  Then the join() function adds commas between the strings and joins all of the strings and commas together into a single string.  Next, a newline character is added to the end of the string.  Then the filewriter’s write() method writes the string to the output file.  This process of stripping, splitting, manipulating, and writing occurs for each of the rows in the input file.  Finally, in the twelfth line, the close() method closes the filewriter object.

Now that we understand what the code is supposed to do, let’s save this file as a Python script and use it to process the data in our input file, customers.csv.  To save the file as a Python script, open the ‘Save As’ dialog box.  In the location box, navigate to your Desktop so the file will be saved on your Desktop.  In the format box, select ‘All Files’ so that the dialog box doesn’t select a specific file type.  Finally, in the ‘Save As’ or ‘File Name’ box, type process_csv_file.py.  Click ‘Save’.  Now you have a Python script you can use to read and write the contents of a CSV file.

Figure II: Python Script - process_csv_file.py

Figure II: Python Script – process_csv_file.py

To use process_csv_file.py to read and write the contents of customers.csv, open a Command Prompt (Windows) or Terminal (Mac) window.  When the window opens the prompt will be in a particular folder, also known as a directory (e.g. “C:\Users\Clinton\Documents”).  The next step is to navigate to the Desktop, where we saved the Python script.

To move between folders, you can use the ‘cd’ command, a Unix command which stands for change directory.  To move up and out of the ‘Documents’ folder into the ‘Clinton’ folder, type the following and then hit Enter:

cd ..

That is, the letters ‘cd’ together followed by one space followed by two periods.  The two periods ‘..’ stand for up one level.  At this point, the prompt should look like “C:\Users\Clinton”.  Now, to move down into a specific folder you use the same ‘cd’ command followed by the name of the folder you want to move into.  Since the ‘Desktop’ folder resides in the ‘Clinton’ folder, you can move down into the ‘Desktop’ folder by typing the following and then hitting Enter:

cd Desktop

At this point, the prompt should look like “C:\Users\Clinton\Desktop” in the Command Prompt and we are exactly where we need to be since this is where we saved the Python script and CSV input file.  The next step is to run the Python script.

To run the Python script, type one of the following commands on the command line, depending on your operating system, and then hit Enter:

Windows:

python process_csv_file.py customers.csv my_output.csv

That is, type python, followed by a single space, followed by process_csv_file.py, followed by a single space, followed by customers.csv, followed by a single space, followed by my_output.csv, and then hit Enter:

Mac:

chmod +x process_csv_file.py

./process_csv_file.py customers.csv my_output.csv

That is, type chmod, followed by a single space, followed by +x, followed by a single space, followed by process_csv_file.py, and then hit Enter.  This command makes the Python script executable.  Then type ./process_csv_file.py, followed by a single space, followed by customers.csv, followed by a single space, followed by my_output.csv, and then hit Enter:

After you hit Enter, you should not see any new output in the Command Prompt or Terminal window.  However, if you minimize all of your open windows and look at your Desktop there should be a new CSV file called my_output.csv.  Open the file.  The contents should look like:

Figure III: Output file - my_output.csv

Figure III: Output File – my_output.csv

As you can see, the data in the input file, customers.csv, was successfully written to the output file, my_output.csv.  This example demonstrates the basic process of reading a CSV file and writing data to an output file.  The example is slightly unrealistic in that there isn’t any reason to re-write the data in the input file to an output file.  However, it’s easy to imagine business logic that would make the output more interesting.  Perhaps you only need the rows where the cost is greater than a specific threshold or where the purchase date is before a particular cut-off date.  You would only need to make slight modifications to the code discussed above to add this business logic to your script.

This blog post covered processing a single CSV file, but one of the real advantages of learning to code in Python is the ability to automate and scale your processes to handle many files at a time.  In a future post, I’ll cover modifications you can make to the code discussed in this post to create a Python script that processes multiple CSV files.  Having read this post, I hope you’re now more familiar with Python and excited to start using it.  If you have any questions, please reply to this post.

To Read or Not to Read? Hopefully the Former!

It has been far too long since I posted to this blog.  It’s time to let you know what I’ve been working on over the past few months.  I’ve been writing a book on a topic that is near and dear to my heart.

The book is titled Multi-objective Decision Analysis: Managing Trade-offs and Uncertainty.  It is an applied, concise book that explains how to conduct multi-objective decision analyses using spreadsheets.

The book is scheduled to be published by Business Expert Press in 2013.  For a little more information about my forthcoming book, please read the abstract shown below:

“Whether managing strategy, operations, or products, making the best decision in a complex uncertain business environment is challenging.  One of the major difficulties facing decision makers is that they often have multiple, competing objectives, which means trade-offs will need to be made.  To further complicate matters, uncertainty in the business environment makes it hard to explicitly understand how different objectives will impact potential outcomes.  Fortunately, these problems can be solved with a structured framework for multi-objective decision analysis that measures trade-offs among objectives and incorporates uncertainties and risk preferences.

This book is designed to help decision makers by providing such an analysis framework implemented as a simple spreadsheet tool.  This framework helps structure the decision making process by identifying what information is needed in order to make the decision, defining how that information should be combined to make the decision and, finally, providing quantifiable evidence to clearly communicate and justify the final decision.

The process itself involves minimal overhead and is perfect for busy professionals who need a simple, structured process for making, tracking, and communicating decisions.  With this process, decision making is made more efficient by focusing only on information and factors that are well-defined, measureable, and relevant to the decision at hand.  The clear characterization of the decision required by the framework ensures that a decision can be traced and is consistent with the intended objectives and organizational values.  Using this structured decision-making framework, anyone can effectively and consistently make better decisions to gain a competitive and strategic advantage.”

Look for my forthcoming book, Multi-objective Decision Analysis, on the bookshelves in 2013!

Source: favim.com

Know Thyself: The Value of Specifying Your Values

A few days ago I was chatting with a friend about a new company initiative she’s going to be working on.  The new initiative is an important strategic maneuver for the company.  If it is designed and implemented well, the company will be well-positioned to compete in the new market.  However, if it isn’t designed or implemented well, it will significantly drain the company’s limited management and financial resources and likely weaken the company’s competitive edge.

As our conversation continued, one important problem with the new initiative became apparent.  Based on available information, it appears the overall strategy and objectives for the new initiative aren’t well thought out.  This is an important problem for several reasons – it makes it difficult for workers to know what tasks need to be done to advance the initiative, makes it difficult for managers to track and guide progress on all of the project’s phases, and impedes clear communication.

Having identified the problem with the current state of the initiative, namely, that management has not clearly defined what they want to achieve with the initiative, we began discussing potential solutions.  The first thing we wanted to do was brainstorm the factors we thought were important to the company with respect to the initiative; however, it became clear right away that first we needed to define some terminology.

The problem was that we were using different words to describe the ideas we wanted to discuss.  One minute we were talking about the company’s values, another minute we were talking about the company’s objectives, and the next minute we were talking about the company’s goals.  We needed to agree on the definitions of the terms we were using.  After spending some time discussing possible definitions, we agreed on the following definitions:

  • Values are the areas of concern, considerations, or matters you think are significant enough to be taken into account when evaluating alternatives.  For example, values for the company considering alternative ways of rolling out the initiative may be ease of implementation, company image, and profit.
  • Objectives augment values by specifying the preferred direction of movement.  Thus, the company considering alternative ways of rolling out the initiative would find an alternative that is easier to implement more desirable.
  • Goals are thresholds of achievement with respect to values and objectives that are either achieved or not by an alternative that’s being evaluated.  For example, the company might have a goal of implementing the initiative within eight weeks.  For a given alternative, this goal may or may not be achievable.
  • Finally, underlying each of these terms is the idea of a measure, a measuring scale for the degree of attainment of an objective.  For example, the company may use “annual profit in dollars” as the measure for the objective of increasing profit.

By thinking carefully about these terms, and agreeing on their definitions, we developed a common vocabulary we could use to communicate clearly with one another.  Having defined the terminology, we brainstormed the company’s values and objectives with respect to the initiative and even defined the measures we would use to gauge the degree of attainment of the objectives.

From start to finish, the whole conversation didn’t take more than an hour, but by the end we had a shared understanding of the values and objectives the company should try to achieve with the initiative, a shared understanding of how we would measure degrees of attainment of the objectives, and significant insight into how we would evaluate the alternative ways of rolling out the initiative.  Not bad for an hour-long conversation.

This structuring process and terminology is applicable to any situation in which you want to evaluate alternatives based on multiple values and objectives.  By specifying the values you want to use to evaluate your alternatives and defining the objectives and measures you’ll use to judge the degree of attainment of your objectives, you’ll reap the significant benefits of thoroughly understanding your decision situation, being able to articulate a clear rationale for your decision, and being able to identify the alternative that, according to your values and objectives, is the most valuable to you.

You Might Be a Decision Analyst If…

Over the holidays, I visited my sister and brother-in-law.  While we were chatting, he mentioned he wanted to refurbish some audio equipment and was trying to decide between fixing the electrical components himself and hiring an expert to do it for him.  He said he was concerned with the potential cost and quality of the repairs, as well as how long it would take to complete the repairs, but he was uncertain about the range of values these three factors could take under the two alternatives.  I must have been smiling from ear to ear.  Without even realizing it, he’d described a simple, yet interesting multi-objective decision under uncertainty, and I wanted to show him how I could help him with his decision situation.

I pulled out a sheet of paper and quickly drew a rough influence diagram to represent his decision, the three uncertainties, and the resulting outcome “node” of his confidence or pride in the refurbished system.  Then I asked him about the range of costs the components could take and the range of time and quality the repairs could take.  After drafting probability distributions for the three factors, we chatted about his value functions and weights for the three factors and assigned scores and probabilities to the two alternatives across all three factors.  I transferred all of this information, which only took a few minutes to collect, into a spreadsheet and quickly churned out preliminary certainty equivalents for the two alternatives.   After performing a quick sensitivity analysis on some of the model’s inputs, including the weights and multi-attribute risk tolerance, we determined we had robust certainty equivalents for the alternatives.

My brother-in-law was pleased with the whole exercise.  It helped him frame and understand the decision situation.  By breaking the problem down into objectives, uncertainties, scores, and values, he could clearly see how changes in those factors would affect which alternative was preferred.  Best of all, in this case, the analysis was quick and free…unless you count the cost of the sandwiches we ate while we worked on the problem.  I enjoyed conducting this decision analysis with my brother-in-law and was very pleased that he thought it had been a valuable use of his lunchtime.

Do you have a similar story?  Have you ever helped a friend or family member with an interesting decision problem?  Are you working on an interesting problem at work?  When you get a chance, please take a few minutes to write about and share your experience.  You can think of it as your chance to tell the next “You might be a decision analyst if…” story.

I Think, Therefore My Brain Hurts

The last time you chose a health plan for yourself and your family, how much effort did you expend learning about and comparing the options before choosing a plan?  The last time you implemented a new strategy for your organization, how much effort did you expend coming up with and considering the alternatives before implementing the strategy?  The last time you updated your investment portfolio, how much effort did you expend researching and evaluating your opportunities before updating your portfolio?

Thinking Is Hard, So I’m Going to Stop Now

I’d be willing to bet that a common, honest response to these and similarly important questions is, “Not as much effort as I could or should have spent”.  There are lots of reasons why we might not spend enough time and energy thinking before making major decisions – we might not know what we want to achieve, we might not be able to collect or interpret much of the relevant information, we might not know how to deal with uncertainties, or we might be pressed for time.  In these situations, the costs of additional thought may loom larger in our minds than the potential benefits, so we halt our thinking process and simply make a decision.

Image Source: livornocorp.com.pk

As an example, in the context of group decision making, think about the last time you were in a group meeting, the group had one important item left to discuss and vote on (e.g. whether to fund a $200,000 project), and the meeting was already running over the time allocated for the meeting.  How much time and energy did the group spend discussing the important item before voting?  Probably not as much as they could or should have spent.

Thinking Hard about the Benefits of Thinking Hard

One reason why we might not value the potential benefits of additional thinking as highly in our “think more/stop thinking” calculus, and so cut short our thinking process in order to make a decision, is that we might not know or have an appreciation for the many benefits of thinking.  However, the benefits of thinking (that is, spending the time and energy necessary to think hard about issues) are abundant and consequential.  Even just four benefits are enough to give you a feel for the significance of further thinking:

1. Thinking helps you clarify your goals and preferences

2. Thinking helps you reframe your problem and identify alternatives

3. Thinking helps you discover what additional information you need and, possibly, where to get it

4. Thinking helps you have control over your actions

A noteworthy fifth benefit of in-depth thinking is that it helps you develop a habit of thinking methodically and thoroughly.  The more time you spend thinking hard about issues, even simple ones, the more likely you are to become comfortable with effective techniques for thinking about and dealing with complicated issues.  In this way, the costs of thinking hard in the short-term are dwarfed and overshadowed by the significant long-term benefits.

Image Source: 203.175.165.17

Think of Thinking Hard as a Healthy Habit

That (at times) we don’t expend enough effort thinking before making major decisions is understandable – thinking is time consuming, it’s costly, it’s hard work.  One reason why we might not think enough, even when we know additional thinking could be helpful, is that we don’t fully appreciate the long-term benefits of putting in the effort to think hard in the short-term.

By keeping in mind the wide-ranging benefits of thinking deeply, you’ll be able to properly evaluate whether, in a specific case, additional thinking would be worthwhile.  You’ll be able to learn and practice using techniques for analyzing and solving problems.  And, ultimately, you’ll be able to use the tools and strategies you’ve learned to make better decisions.

How Many Ways Can I Arrange My Words And Still Get My Point Across? May I Repeat Myself?

A couple of years ago, my friends and I went to an indoor shooting range for a friend’s bachelor party.  One of the firearms we chose was a .44 Magnum, a revolver made famous by Clint Eastwood’s “Dirty Harry” films.  Holding the revolver in my hand, I thought, “Why is this weapon called a revolver?  Doesn’t the cylinder rotate around its axis?  Perhaps it would be more accurate to call this weapon a rotator.”  I’ll admit, that’s probably not the first thing most people think about when they hold a revolver in their hand (besides, maybe the weapon is called a revolver because the bullets revolve about the cylinder’s axis).

Image Source: redensign.wordpress.com

In any case, the experience reminded me of another product that has a confusing (and arguably inaccurate) name, the combination lock.  A combination lock is a type of lock in which you have to enter a specific sequence of numbers to open the lock.  The name is confusing because, in mathematics, the correct term for a set of numbers that must be in a specific sequence is a permutation.  While it probably wouldn’t be very productive to try to change the naming convention for the combination lock, it is important to understand the difference between combinations and permutations.

Image Source: en.wikipedia.org/wiki/Combination_lock

Deos Oedrr Mettar?  Does Order Matter?

To know whether you’re dealing with a combination or a permutation, ask yourself one question – does order matter?  If it does, you’re dealing with a permutation.  If it doesn’t, you’re dealing with a combination.  After determining whether order matters, you need to find out whether repetition is allowed in the selection process.  For both combinations and permutations, the formulas are different depending on whether repetition is allowed.

In an earlier article, I mentioned I’m on a local policy advisory board and soon we’re going to elect three members of our 15-person board into leadership roles – Chair, Vice-chair, and 2nd Vice-chair.  These are distinct roles, which are in decreasing order of responsibility.  It’s fairly obvious that one outcome (e.g. Tess being elected Chair, Filbert being elected Vice-chair, and Laura being elected 2nd Vice-chair) is very different from another outcome (e.g. Laura being elected Chair, Tess being elected Vice-chair, and Filbert being elected 2nd Vice-chair).

Image Source: gotoworkonenw.com

Permutation – When Order Matters

Since order matters in this example, we know we’re dealing with a permutation, and since each person can only hold one position, we also know repetition is not allowed.  Given this information, the number of ways we could select three members of our 15-person board for the ordered leadership positions is 2,730.1

Combination – When Order Doesn’t Matter

A similar example in which order does not matter is when we’re selecting a subset of members to be on one of the board’s sub-committees.  In this case, a sub-committee composed of Tess, Filbert, and Laura is identical to a sub-committee composed of Laura, Tess, and Filbert.  That is, the order of the members doesn’t matter.  And since we can’t select a person twice for the same sub-committee, once again we know repetition is not allowed.  Given this information, the number of ways we could select three members of our 15-person board for the unordered sub-committee positions is 455.2

Does Order Matter?
Yes No
Is Repetition Allowed? Permutation Combination
Yes nr (n+r-1)! / r!(n-1)!
No n! / (n-r)! n! / r!(n-r)!

Reinforcing with Repetition

As you can see, it is important to determine whether order matters.  In the first example, where it does, there are 2,730 permutations.  In the second example, where it doesn’t, there are only 455 combinations.  While we sometimes use the term combination loosely to describe sets of objects regardless of whether order matters, it’s important to remember that there is a difference between combinations and permutations and you can identify which one you’re dealing with by asking yourself whether the order of objects matters.  By understanding the difference between the two concepts and being able to use the correct formulas, you’ll be able to calculate the right quantities and make informed decisions based on accurate information.

1 Permutation without repetition: n! / (n – r)! = 15! / (15 – 12)! = 15 * 14 * 13 = 2,730

2 Combinations without repetition: n! / r!(n – r)! = 15! / 3!(15 – 12)! = (15 * 14 * 13) / (3 * 2 * 1) = 455