Foundations for Analytics with Python: From Non-programmer to Hacker

I’m excited to share that O’Reilly Media is about to publish my new book, Foundations for Analytics with Python: From Non-programmer to Hacker. The book is geared toward people who have no prior programming experience but deal with data every day and are interested in learning how to scale and automate their work.

Foundations for Analytics with Python by Clinton Brownley, PhD

I did not have a background in programming. I learned it on the job because I recognized it would enable me to automate repetitive actions and accomplish tasks that would be time-consuming or impossible with my current skill set. I read countless books, online tutorials, and blog posts in those first few weeks and months as I attempted to get my first program for work to do something useful for me. It’s difficult to fully describe how exhilarating and empowering it was when I finally got the program to work correctly. Needless to say, I was hooked, and I haven’t looked back.

I wrote the book with a few objectives in mind:

  • Be accessible to ambitious non-programmers
  • Be practical, so you can immediately see how you can use the code at work
  • Teach fundamental programming concepts and techniques and also provide alternative, shorter code that performs the same actions
  • Make the learning curve as short and shallow as possible so you can enjoy the fruits of your labor as quickly as possible

The book’s features reflect these objectives:

  • Each section focuses on one specific task, so you can learn how to accomplish that task without distractions
  • Each section is a complete, self-contained program, so you don’t have to remember to combine a bunch of code snippets to make them work
  • In the CSV and Excel chapters, each section of code has two versions, a base Python version and a Pandas version. The base Python version teaches you fundamental concepts and techniques. The Pandas version shortens and simplifies the code you need to write to accomplish the task
  • Uses the Anaconda Python 3 distribution, which bundles the newest version of Python with some of the most popular add-in packages
  • Includes screen shots of the input files, Python code, command line, and output files
  • Common data formats, including plain text, CSV, and Excel files, and databases
  • Common data processing tasks, including filtering for specific rows, selecting specific columns, and calculating summary statistics
  • Chapters on data analysis, plotting and graphing, and automation
  • Three real-world applications that illustrate how you can combine and extend techniques from earlier chapters to accomplish important data processing tasks
  • Both Windows and Mac commands and screen shots

To give you a feel for the book, let me provide a few sections of code from the book and the table of contents. The first section of code comes from the CSV chapter, the second section of code from the Excel chapter, and the third section of code from the Database chapter.  The brief comments after each section of code are for this blog post, they are not in the book.  If you want to see what other topics are included in the book, please see the table of contents at the bottom of this post.

Example Section #1: CSV Files

Reading and Writing a CSV File

Version #1: Base Python

#!/usr/bin/env python3
import csv
import sys

input_file = sys.argv[1]
output_file = sys.argv[2]

with open(input_file, 'r', newline='') as csv_in_file:
    with open(output_file, 'w', newline='') as csv_out_file:
        filereader = csv.reader(csv_in_file, delimiter=',')
        filewriter = csv.writer(csv_out_file, delimiter=',')
        for row_list in filereader:
            filewriter.writerow(row_list)

Version #1 demonstrates how to read a CSV input file with base Python’s standard csv module and write the contents to a CSV output file. In the book, I explain every line of code. This first example gives you the ability to transfer all of your data to an output file. The subsequent examples in the chapter show you how to select specific data to write to the output file and how to process multiple CSV files.

Version #2: Pandas Add-in Module

#!/usr/bin/env python3
import sys
import pandas as pd

input_file = sys.argv[1]
output_file = sys.argv[2]

data_frame = pd.read_csv(input_file)
print(data_frame)
data_frame.to_csv(output_file, index=False)

Version #2 demonstrates how to accomplish the same task with Pandas. As you can see, you simply use read_csv to read the input file and to_csv to write to the output file.

Example Section #2: Excel Files

Reading and Writing an Excel Worksheet

Version #1: xlrd and xlwt Add-in Modules

#!/usr/bin/env python3
import sys
from xlrd import open_workbook
from xlwt import Workbook

input_file = sys.argv[1]
output_file = sys.argv[2]

output_workbook = Workbook()
output_worksheet = output_workbook.add_sheet('output_worksheet_name')

with open_workbook(input_file) as workbook:
    worksheet = workbook.sheet_by_name('input_worksheet_name')
    for row_index in range(worksheet.nrows):
        for column_index in range(worksheet.ncols):
            output_worksheet.write(row_index, column_index, \
                worksheet.cell_value(row_index, column_index))
output_workbook.save(output_file)

Version #1 demonstrates how to read and write an Excel worksheet with base Python and the xlrd and xlwt add-in modules. Again, this first example gives you the ability to transfer all of the data on one worksheet to an output file. The subsequent examples in the chapter show you how to select specific data to write to the output file, how to process multiple worksheets, and how to process multiple workbooks.

Version #2: Pandas Add-in Module

#!/usr/bin/env python3
import pandas as pd
import sys

input_file = sys.argv[1]
output_file = sys.argv[2]

data_frame = pd.read_excel(input_file, sheetname='input_worksheet_name')
writer = pd.ExcelWriter(output_file)
data_frame.to_excel(writer, sheet_name='output_worksheet_name', index=False)
writer.save()

Version #2 demonstrates how to accomplish the same task with Pandas. Again, you simply use read_excel to read the input worksheet and to_excel to write to the output worksheet.

Example Section #3: Databases

Query a table and write results to a file

#!/usr/bin/env python
import csv
import MySQLdb
import sys

output_file = sys.argv[1]

con = MySQLdb.connect(host='localhost', port=3306, db='my_suppliers', user='my_username', passwd='my_password')
c = con.cursor()

filewriter = csv.writer(open(output_file, 'wb'), delimiter=',')
header = ['Supplier Name','Invoice Number','Part Number','Cost','Purchase Date']
filewriter.writerow(header)

c.execute("""SELECT * FROM Suppliers WHERE Cost > 700.0;""")
rows = c.fetchall()
for row in rows:
    filewriter.writerow(row)

This example demonstrates how to connect to a database, query a table, and write the resulting data to a CSV output file. Other examples in the chapter explain how to load data into a database table from a file and update records in a table based on data in a file.

I hope these examples give you a feel for the book. If you want to see what other topics are included in the book, please see the table of contents shown below. Foundations for Analytics with Python is scheduled to be available in May 2016. Please keep an eye out for it, and if you know other people who may be interested please point them to this blog post and the Amazon link.  Thank you : )

 

TABLE OF CONTENTS

CHAPTER
Introduction
Why Read This Book/Why Learn These Skills
Who Is This Book For
Why Windows
Why Python
Install Anaconda Python
Text Editors
Download Book Materials
Base Python and Pandas
Overview of Chapters

CHAPTER
Python Basics
How To Create a Python Script
How To Run a Python Script
Numbers
Strings
Regular Expressions/Pattern Matching
Dates
Lists
Tuples
Dictionaries
Control Flow
Functions
Exceptions
Reading a Text File
Reading Multiple Text Files with Glob
Writing to a Text File
Writing to a Comma Separated Values “CSV” File
Print Statements

CHAPTER
Comma Separated Values “CSV” Text Files
Reading and Writing a CSV File (String Manipulation)
Reading and Writing a CSV File (Standard csv Module)
Filtering for Rows
    Value in Row Meets a Condition
    Value in Row is in a Set of Interest
    Value in Row Matches a Pattern (Regular Expression)
Selecting Columns
    Index Values
    Column Headings
Reading Multiple CSV Files
    Count Number of Files and Rows and Columns in Each File
    Concatenate Data From Multiple Files
    Sum and Average a Set of Values Per File
Selecting Contiguous Rows
Adding a Header Row

CHAPTER
Microsoft Excel Files
Introspecting an Excel Workbook
Reading a Single Worksheet
    Formatting Dates
    Filtering for Rows
        Value in Row Meets a Condition
        Value in Row is in a Set of Interest
        Value in Row Matches a Pattern (Regular Expression)
    Selecting Columns
        Index Values
        Column Headings
Reading All Worksheets
    Filtering for Rows from All Worksheets
    Selecting Columns from All Worksheets
Reading a Subset of Worksheets
    Filtering for Rows from Subset of Worksheets
    Selecting Columns from Subset of Worksheets
Reading Multiple Workbooks
    Count Number of Workbooks and Rows and Columns in Each Workbook
    Concatenate Data from Multiple Workbooks
    Sum and Average a Set of Values Per Worksheet Per Workbook

CHAPTER
Databases
Python’s Standard sqlite3 Module
    Create a Database
    Create a Database Table
    Insert Hand-written Data into a Database Table
    Query a Database Table
    Insert Data from a CSV File into a Database Table
    Update Records in a Database Table with Data from a CSV File
MySQL Database
    Create a Database
    Create a Database Table
    Insert Data from a CSV File into a Database Table
    Query a Database Table and Write Output to a CSV File
    Update Records in a Database Table with Data from a CSV File

CHAPTER
Applications
Find a Set of Items in a Large Collection of Excel and CSV Files
Parse a CSV File and Calculate a Statistic for Any Number of Categories
Parse a Text File and Calculate a Statistic for Any Number of Categories

CHAPTER
Graphing and Plotting
matplotlib
pandas
ggplot
seaborn

CHAPTER
Data Analysis
Descriptive statistics
Regression
Classification

CHAPTER
Automation
Windows: scheduled tasks
Mac: cron jobs

CHAPTER
Conclusion
Where To Go From Here
    Additional Built-Ins/Standard Modules
    Additional Add-In Modules
    Data Structures
How To Go From Here

APPENDIX
Downloads
Python
xlrd
mysqlclient/MySQL-python/MySQLdb
MySQL

Advertisements

Too Many Files? Feed Them to Python’s Voracious Glob Module

In my previous post I mentioned that a coworker had recently emailed me a folder full of over two hundred Excel files and asked me to extract some relevant data from each file. I noted how undertaking that task manually would have been time-consuming and error-prone and described how exhilarating it was to accomplish the task quickly by writing some Python code.

I didn’t show how to process multiple files with Python in that post because it is easier to understand the code for processing multiple files once you’re familiar with code for processing one file. So that’s why in that post I demonstrated how to read and write a single CSV file with Python. With that knowledge under our belts, we’re now prepared to understand Python code for processing multiple CSV files.

One good way to learn to code in Python is to create small datasets on your laptop and then write a Python script to process or manipulate them in some way, so that’s what we’ll do here. The following example demonstrates one way to read multiple CSV files (with similarly formatted data), concatenate the data from the files, and write the results to an output file.

One assumption I make in this example is that you’ve already visited http://www.python.org/ and downloaded and installed the version of Python that is compatible with your computer’s operating system. Another assumption is that all of the input files are located in the same folder. Also, unlike in my previous post, the script in this example can handle commas embedded in column values because it imports Python’s built-in csv module, which makes it easier to handle numbers with embedded commas, e.g. $1,563.25.

Ok, in order to process multiple CSV files we need to create multiple CSV files. Open Microsoft Excel and add the following data:

Figure I: 1st Input File - sales_january2014.csv

Figure I: 1st Input File – sales_january2014.csv

Now open the ‘Save As’ dialog box. In the location box, navigate to your Desktop so the file will be saved on your Desktop. In the format box, select ‘Comma Separated Values (.csv)’ so that Excel saves the file in the CSV format. Finally, in the ‘Save As’ or ‘File Name’ box, type “sales_january2014”. Click ‘Save’.

Ok, that’s one input file. Now let’s create a second input file. Open a new Excel workbook and add the following data:

Figure II: 2nd Input File - sales_february2014.csv

Figure II: 2nd Input File – sales_february2014.csv

Now open the ‘Save As’ dialog box. In the location box, navigate to your Desktop so the file will be saved on your Desktop. In the format box, select ‘Comma Separated Values (.csv)’ so that Excel saves the file in the CSV format. Finally, in the ‘Save As’ or ‘File Name’ box, type “sales_february2014”. Click ‘Save’. Ok, now we have two CSV input files, one for January and one for February. We’ll stick with two input files in this example to keep it simple, but please keep in mind that the code in this example can handle many more files; that is, it will scale well.

Now that we have two CSV files to work with, let’s create a Python script to read the files and write their contents to an output file. Open your favorite text editor (e.g. Notepad) and add the following lines of code:

#!/usr/bin/python
import csv
import glob
import os
import sys

input_path = sys.argv[1]
output_file = sys.argv[2]

filewriter = csv.writer(open(output_file,’wb’))
file_counter = 0
for input_file in glob.glob(os.path.join(input_path,’*.csv’)):
        with open(input_file,’rU’) as csv_file:
                filereader = csv.reader(csv_file)
                if file_counter < 1:
                        for row in filereader:
                                filewriter.writerow(row)
                else:
                        header = next(filereader,None)
                        for row in filereader:
                                filewriter.writerow(row)
        file_counter += 1

The first line is a comment line that makes the script transferable across operating systems. The next four lines import additional built-in Python modules so that we can use their methods and functions. You can read more about these and other built-in modules at: http://docs.python.org/2/library/index.html.

The sixth line uses argv from the sys module to grab the first piece of information after the script name on the command line, the path to and name of the input folder, and assigns it to the variable input_path. Similarly, the seventh line grabs the second piece of information after the script name, the path to and name of the output file, and assigns it to the variable output_file.

The eighth line uses the csv module to open the output file in write ‘w’ mode and create a writer object, filewriter, for writing to the output file. The ‘b’ enables a distinction between binary and text files for systems that differentiate between binary and text files, but for systems that do not, the ‘b’ has no effect. The ninth line creates a variable, file_counter, to store the count of the number of files processed and initializes it to zero.

The tenth line creates a list of the input files to be processed and also starts a “for” loop for looping through each of the input files. There is a lot going on in this one line, so let’s talk about how it works. os.path.join joins the two components between its parentheses. input_path is the path to the folder that contains the input files and ‘*.csv’ represents any file name that ends in ‘.csv’.

glob.glob expands the asterisk ‘*’, a Unix Shell wildcard character, in ‘*.csv’ into the actual file name. Together, glob.glob and os.path.join create a list of our two input files, e.g. [‘C:\Users\Clinton\Desktop\sales_january2014.csv’, ‘C:\Users\Clinton\Desktop\sales_february2014.csv’]. Finally, the “for” loop syntax executes the lines of code beneath this line for each of the input files in this list.

The eleventh line uses a “with” statement to open each input file in read ‘r’ mode. The ‘U’ mode helps recognize newlines in case your version of Python is built without universal newlines. The twelfth line uses the csv module to create a reader object, filereader, for reading each input file.

The thirteenth line creates an “if-else” statement for distinguishing between the first input file and all subsequent input files. The first time through the “for” loop file_counter equals zero, which is less than one, so the “if” block is executed. The code in the “if” block writes every row of data in the first input file, including the header row, to the output file.

At the bottom of the “for” loop, after processing the first input file, we add one to file_counter. Therefore, the second time through the “for” loop file_counter is not less than one, so the “else” block is executed. The code in the “else” block uses the csv module’s next() method to read the first row, i.e. the header row, of the second and subsequent input files into the variable, header, so that it is not written to the output file. The remaining code in the “else” block writes the remaining rows in the input file, the rows of data beneath the header row, to the output file.

Now that we understand what the code is supposed to do, let’s save this file as a Python script and use it to process our two input files. To save the file as a Python script, open the ‘Save As’ dialog box. In the location box, navigate to your Desktop so the file will be saved on your Desktop. In the format box, select ‘All Files’ so that the dialog box doesn’t select a specific file type. Finally, in the ‘Save As’ or ‘File Name’ box, type process_many_csv_files.py. Click ‘Save’. Now you have a Python script you can use to process multiple CSV files.

Figure III: Python Script - process_many_csv_files.py

Figure III: Python Script – process_many_csv_files.py

To use process_many_csv_files.py to read and write the contents of our two input files, open a Command Prompt (Windows) or Terminal (Mac) window. When the window opens the prompt will be in a particular folder, also known as a directory (e.g. “C:\Users\Clinton\Documents”). The next step is to navigate to the Desktop, where we saved the Python script.

To move between folders, you can use the ‘cd’ command, a Unix command which stands for change directory. To move up and out of the ‘Documents’ folder into the ‘Clinton’ folder, type the following and then hit Enter:

cd ..

That is, the letters ‘cd’ together followed by one space followed by two periods. The two periods ‘..’ stand for up one level. At this point, the prompt should look like “C:\Users\Clinton”. Now, to move down into a specific folder you use the same ‘cd’ command followed by the name of the folder you want to move into. Since the ‘Desktop’ folder resides in the ‘Clinton’ folder, you can move down into the ‘Desktop’ folder by typing the following and then hitting Enter:

cd Desktop

At this point, the prompt should look like “C:\Users\Clinton\Desktop” in the Command Prompt and we are exactly where we need to be since this is where we saved the Python script and two CSV input files. The next step is to run the Python script.

To run the Python script, type one of the following commands on the command line, depending on your operating system, and then hit Enter:

Windows:
python process_many_csv_files.py . sales_summary.csv

That is, type python, followed by a single space, followed by process_many_csv_files.py, followed by a single space, followed by a single period, followed by a single space, followed by sales_summary.csv, and then hit Enter. The single period refers to the current directory, your Desktop (i.e. the folder that contains your two input files).

Mac:
chmod +x process_many_csv_files.py
./process_many_csv_files.py . sales_summary.csv

That is, type chmod, followed by a single space, followed by +x, followed by a single space, followed by process_many_csv_files.py, and then hit Enter. This command makes the Python script executable. Then type ./process_many_csv_files.py, followed by a single space, followed by a single period, followed by a single space, followed by sales_summary.csv, and then hit Enter:

After you hit Enter, you should not see any new output in the Command Prompt or Terminal window. However, if you minimize all of your open windows and look at your Desktop there should be a new CSV file called sales_summary.csv. Open the file. The contents should look like:

Figure IV: Output File - sales_summary.csv

Figure IV: Output File – sales_summary.csv

As you can see, a single header row and the six rows of data from the two input files were successfully written to the output file, sales_summary.csv. Often, this procedure of concatenating multiple input files into a single output file is all you need to do to begin your analysis. However, sometimes you may not need all of the rows or columns in the output file. Or you may need to modify the data or perform a calculation before writing it to the output file. In many cases, you would only need to make slight modifications to the code discussed above to alter the data written to your output file.

In this example there were only two input files, but the code generalizes to basically as many input files as your computer can handle. So if you need to concatenate the data in a few dozen, hundred, or thousand CSV files the code is basically re-useable as-is. This ability to automate and scale repetitive procedures on files and data is one of the great advantages of learning to code (even a little bit).

Now you know that it only takes a few lines of code to automate a process that, given a larger number of files, would be time-consuming and error-prone or even infeasible. Being able to quickly accomplish a task that would be difficult or impossible to do manually is empowering. Add in the benefit of eliminating manual “copy/paste” errors (once you debug your code), and the new capability is really exciting. Having read this post, I hope you’re now more familiar with Python and eager to begin using it. If you have any questions, please reply to this post.

The Two Rs of Python: Reading and Writing CSV Files

A coworker recently emailed me a folder full of over two hundred Excel files and asked me to extract some relevant data from each file.  A few years ago that task would have been daunting.  Like many other business people, I would have had to manually open each of the files and extract (copy and paste) the relevant data into a new output file.  With so many files, the undertaking would have required a lot of time and been prone to errors, even with careful attention to detail.

Thankfully, I had already learned how to code in Python, an easy-to-learn programming language that is great for processing files and manipulating data.  With Python, I was able to write a short script that read each of the files, extracted the relevant data from each file, and wrote that data to an output file.  The whole process took a couple of hours, compared to the days or weeks it would have taken to do it manually.

This type of capability, reducing the time it takes to complete a task from weeks to hours, is exhilarating.   That’s one of the reasons why I’m going to start including “analytics” topics like this one in this blog.  Another benefit of learning to do your work with Python is that the method scales.  As I described in my story, as the number of files grows it simply becomes more and more difficult to complete the work in a timely fashion.  Once you learn how to use Python to process and manipulate files and data, you won’t want to return to tiresome manual processes.

One of the ways I learned to code in Python was to create small datasets in a folder on my laptop and then write a Python script to process or manipulate them in some way.  Since I found this method to be really helpful, that’s how I’m going to present the example in this blog post.  Processing multiple files isn’t that complicated once you understand how all of the lines of code work together, but it does involve several concepts that are easier to understand once you know how to process a single file, so that’s where we’ll start.

The following example demonstrates one way to read a single comma separated values (CSV) file and write its contents to an output file.  One assumption I make in this example is that you’ve already visited http://www.python.org/ and downloaded and installed the version of Python that is compatible with your computer’s operating system.  Another assumption is that none of the values in the spreadsheet contains commas.  Ok, in order to work with a CSV file we need to create a CSV file.  Open Microsoft Excel and add the following data:

Figure I: Input File - customers.csv

Figure I: Input File – customers.csv

Now open the ‘Save As’ dialog box.  In the location box, navigate to your Desktop so the file will be saved on your Desktop.  In the format box, select ‘Comma Separated Values (.csv)’ so that Excel saves the file in the CSV format.  Finally, in the ‘Save As’ or ‘File Name’ box, type “customers”.  Click ‘Save’.  That’s all there is to it.  Now you have a CSV file called customers.csv that you can read into Python.

Now that we have a CSV file to work with, let’s create a Python script to read the file and write its contents to an output file.  Open your favorite text editor (e.g. Notepad) and add the following lines of code:

#!/usr/bin/python
import string
import sys

input_file = sys.argv[1]
output_file = sys.argv[2]

filewriter = open(output_file, ‘wb’)

with open(input_file, ‘rU’) as filereader:
        for row in filereader:
                row = row.strip()
                row_list = row.split(‘,’)
                filewriter.write(‘,’.join(map(str,row_list))+’\n’)
filewriter.close()

The first line is a comment line that makes the script transferable across operating systems.  The second line imports Python’s built-in string module so that we can use its methods and functions, e.g. strip() and split(), to manipulate strings.  The third line imports Python’s built-in sys module so that we can supply additional information to the script on the command line in either the Command Prompt (Windows) or Terminal (Mac) window.

The fourth line uses argv from the sys module to grab the first piece of information after the script name on the command line, the path to and name of the input file, and assigns it to the variable input_file.  Similarly, the fifth line grabs the second piece of information after the script name, the path to and name of the output file, and assigns it to the variable output_file.

The sixth line opens the output file in write ‘w’ mode and creates a writer object, filewriter, for writing to the output file.  The ‘b’ enables a distinction between binary and text files for systems that differentiate between binary and text files, but for systems that do not, the ‘b’ has no effect.

The seventh line uses a “with” statement to open the input file in read ‘r’ mode and creates a reader object, filereader, for reading the input file.  The ‘U’ mode helps recognize newlines in case your version of Python is built without universal newlines.

The eighth line uses a “for” loop to loop through the rows in the input file one by one.  The next three lines of code occur for each row in the input file since they’re contained in the body of the “for” loop (you can tell they’re in the body of the loop because they’re indented).  Each row of data in the input file enters the “for” loop as a string that contains column values separated by commas.

The ninth line uses the string module’s strip() function to remove whitespace, tabs, and newline characters from both ends of the string and then assigns that stripped version of the string to the variable row.  The tenth line uses the string module’s split() function to split the string on the embedded commas into a list of column values and then assigns the list to the variable row_list.

The eleventh line uses the map() and str() functions to convert each of the values in row_list to a string.  Then the join() function adds commas between the strings and joins all of the strings and commas together into a single string.  Next, a newline character is added to the end of the string.  Then the filewriter’s write() method writes the string to the output file.  This process of stripping, splitting, manipulating, and writing occurs for each of the rows in the input file.  Finally, in the twelfth line, the close() method closes the filewriter object.

Now that we understand what the code is supposed to do, let’s save this file as a Python script and use it to process the data in our input file, customers.csv.  To save the file as a Python script, open the ‘Save As’ dialog box.  In the location box, navigate to your Desktop so the file will be saved on your Desktop.  In the format box, select ‘All Files’ so that the dialog box doesn’t select a specific file type.  Finally, in the ‘Save As’ or ‘File Name’ box, type process_csv_file.py.  Click ‘Save’.  Now you have a Python script you can use to read and write the contents of a CSV file.

Figure II: Python Script - process_csv_file.py

Figure II: Python Script – process_csv_file.py

To use process_csv_file.py to read and write the contents of customers.csv, open a Command Prompt (Windows) or Terminal (Mac) window.  When the window opens the prompt will be in a particular folder, also known as a directory (e.g. “C:\Users\Clinton\Documents”).  The next step is to navigate to the Desktop, where we saved the Python script.

To move between folders, you can use the ‘cd’ command, a Unix command which stands for change directory.  To move up and out of the ‘Documents’ folder into the ‘Clinton’ folder, type the following and then hit Enter:

cd ..

That is, the letters ‘cd’ together followed by one space followed by two periods.  The two periods ‘..’ stand for up one level.  At this point, the prompt should look like “C:\Users\Clinton”.  Now, to move down into a specific folder you use the same ‘cd’ command followed by the name of the folder you want to move into.  Since the ‘Desktop’ folder resides in the ‘Clinton’ folder, you can move down into the ‘Desktop’ folder by typing the following and then hitting Enter:

cd Desktop

At this point, the prompt should look like “C:\Users\Clinton\Desktop” in the Command Prompt and we are exactly where we need to be since this is where we saved the Python script and CSV input file.  The next step is to run the Python script.

To run the Python script, type one of the following commands on the command line, depending on your operating system, and then hit Enter:

Windows:

python process_csv_file.py customers.csv my_output.csv

That is, type python, followed by a single space, followed by process_csv_file.py, followed by a single space, followed by customers.csv, followed by a single space, followed by my_output.csv, and then hit Enter:

Mac:

chmod +x process_csv_file.py

./process_csv_file.py customers.csv my_output.csv

That is, type chmod, followed by a single space, followed by +x, followed by a single space, followed by process_csv_file.py, and then hit Enter.  This command makes the Python script executable.  Then type ./process_csv_file.py, followed by a single space, followed by customers.csv, followed by a single space, followed by my_output.csv, and then hit Enter:

After you hit Enter, you should not see any new output in the Command Prompt or Terminal window.  However, if you minimize all of your open windows and look at your Desktop there should be a new CSV file called my_output.csv.  Open the file.  The contents should look like:

Figure III: Output file - my_output.csv

Figure III: Output File – my_output.csv

As you can see, the data in the input file, customers.csv, was successfully written to the output file, my_output.csv.  This example demonstrates the basic process of reading a CSV file and writing data to an output file.  The example is slightly unrealistic in that there isn’t any reason to re-write the data in the input file to an output file.  However, it’s easy to imagine business logic that would make the output more interesting.  Perhaps you only need the rows where the cost is greater than a specific threshold or where the purchase date is before a particular cut-off date.  You would only need to make slight modifications to the code discussed above to add this business logic to your script.

This blog post covered processing a single CSV file, but one of the real advantages of learning to code in Python is the ability to automate and scale your processes to handle many files at a time.  In a future post, I’ll cover modifications you can make to the code discussed in this post to create a Python script that processes multiple CSV files.  Having read this post, I hope you’re now more familiar with Python and excited to start using it.  If you have any questions, please reply to this post.