From 5ba1f9df32ef99e672fdc8a9fe169426b2a17ebe Mon Sep 17 00:00:00 2001 From: benfry Date: Sun, 17 Jun 2007 22:20:09 +0000 Subject: [PATCH] cleaning up movie issues, closing in on release for 0125 --- build/linux/run.sh | 5069 ++++++++++++++++++++ build/shared/revisions.txt | 29 +- core/todo.txt | 11 +- todo.txt | 52 +- video/src/processing/video/Capture.java | 18 +- video/src/processing/video/Movie.java | 3 +- video/src/processing/video/MovieMaker.java | 10 +- 7 files changed, 5147 insertions(+), 45 deletions(-) diff --git a/build/linux/run.sh b/build/linux/run.sh index 29ccd3aa9..a85252297 100755 --- a/build/linux/run.sh +++ b/build/linux/run.sh @@ -1,3 +1,5072 @@ + + + + +4 +5 Connections and Correlations +What is the question? + +In 2004, the Boston Red Sox won the World Series after an 86 year hiatus. As a Red Sox fan, I found this somewhat bittersweet in the sense that the second highest paid team in baseball finally managed to win a championship. With a total salary of around $133 million dollars, they weren't exactly young upstarts. This made me curious about how that works across the league, with raw salaries and the general performance of the individual teams. + + +George Steinbrenner, the owner of the New York Yankees, for instance, had in recent years been accused of trying to "buy" the World Series trophy by assembling a collection of highly paid all-stars. On the other hand, the performance of the Oakland A's, on the other hand, had in years prior far exceeded their overall salary. This story was told by Michael Lewis in the book Moneyball, which covered how Billy Beane, the General Manager (GM) of the A's made use of statistics to pursue players who had promising numbers but were below the radar because they weren't always standouts in the traditional sense. + + +Bill James was one of the first to bring statistics-oriented thinking to Baseball with his writing and later when he began publishing The Bill James Baseball Abstract in 1977. Similar ideas led to the founding of the Society for American Baseball Research (SABR), from which the term Sabermetrics was coined, to refer to this numbers-driven approach to the game. The extent to which statistics can be used remains a controversial topic in sports, a battle between a perception of watering the game down to mere mathematics, versus a focus on the intangible aspects of talented people playing team sports. + + +As with any narrative, Moneyball presents an over-simplification of the system, as does relating total salary in a given year to performance-to-date. There are more complex factors including how contracts work over multiple years, the health of a team's farm system, and scoring methods for individual players that could be taken into consideration. The original version of this project was thrown together while watching a game on television and it's perhaps dangerously un-advanced, given the amount of time and energy that's put into the analysis of sports statistics. + + +However, a win is a win, and as a gross measure, showing the simple correlation can be quite revealing, in particular to observe shifts over the course of a season. Non-baseball fans also seem to enjoy it because wins and losses are straightforward, and they're probably also aware of ever-growing salaries paid to professional atheletes. + +Approach + +(this needs more work, but will cover a bit about what we're about to do) + + +up/down characteristics, will get into in the representation... + + +in this chapter we'll cover + + +correlations and parallel coordinates + + +parsing online data set, detective work to figure out how it goes + + +dynamically pulling data for multiple days (things start to move) + + +label placement (refinement) + +Preprocessing: Acquiring Win/Loss Data + +To find win/loss records for each team, we turn to MLB.com, the web site of Major League Baseball. The standings page is a suitable place to get the information: + + + +http://mlb.mlb.com/mlb/standings/index.jsp + + + +Figure 01 The MLB.com standings page for 2007 + + +Pulling the standings information from this page requires a little bit of detective work. The general process to follow when dealing with web pages: + + +Navigate to the page that contains the data. Because there will be lots of header and footer material, make a note of identifying factors where the data starts. In this case, the table starts near the text "American League," and specifically, beneath "East" for the table heading. + + +Choose View Source and take a look at the code. Use Find to look for your identifiers (American League or East) to see where the data begins. It's important to choose good identifiers. It might be tempting to use "Boston," the first data element in the table. However, in this case that will not work, because the "Select favorite team" pop-up menu already contains a "Boston" entry that will throw off your search. + + +In most cases, the data will begin nearby the identifier you've chosen. The identifying portions will be part of an HTML <TABLE> tag, with the data stored inside a <TD> and </TD> pair. + + +To make things trickier, in this example, the relevant HTML is actually built using JavaScript. The location near "American League," where we might normally find the data, instead contain lines that use a pair of functions named buildTitleRows() and buildRows(): + + + +<div style="padding-top:15px;"> + + + +<h1>American League</h1> + + + +<img src="/mlb/images/al_symbol.gif" width="38" height="31" alt=" + + + +</div> + + + +</td> + + +</tr> + + + + + + +<script>dataExists();</script> + + + + + +<script> + + +<tbody id="ale"><script> + + + + + + +<script> + + +<tbody id="alc"><script> + + + + + + +<script> + + +<tbody id="alw"><script> + + + + + + +<tr> + + + +<td colspan="16" style="padding-top:15px;"> + + + +<h1> + + + +<img src="/mlb/images/nl_symbol.gif" width="38" height="31" t="National League" border="0" align="absmiddle" /> + + + +</td> + + +</tr> + + + + + + +<script> + + +<tbody id="nle"><script> + + + + + +<script> + + +<tbody id="nlc"><script> + + + + + + +<script> + + +<tbody id="nlw"><script> + + +An educated guess will tell you that ale is an abbreviation for the American League East division, alc stands for American League Central, and so on. (A less educated guess could ascertain the same by noting that this is the American League (AL) table, with subheadings East, Central, and West, abbreviated E, C, and W). + + +Next thing to figure out is where the standings_rs_ale array is created, along with five others like it. Another use of the Find command reveals lines that load each array from individual .js (JavaScript) files: + + +<script src="/components/game/year_2007/month_04/day_15/ + + +<script src="/components/game/year_2007/month_04/day_15/ + + +<script src="/components/game/year_2007/month_04/day_15/ + + +<script src="/components/game/year_2007/month_04/day_15/ + + +<script src="/components/game/year_2007/month_04/day_15/ + + +<script src="/components/game/year_2007/month_04/day_15/ + + +The URL for the first item reads: + + +/components/game/year_2007/month_04/day_15/standings_rs_ale.js + + +Because a forward slash is found at the beginning, the reference points to the root of the site, meaning that the full URL is: + + + +http://mlb.mlb.com + + + +Note: If the text did not begin with a slash, the URL would instead be relative, to the original page (http://mlb.mlb.com/mlb/standings/index.jsp) which would make the new URL {{http://mlb.mlb.com/mlb/standings/ + + +For the six divisions, the URLs are then: + + +// American League (AL) + + + +http://mlb.mlb.com/components/game/year_2007/month_04/day_15/standings_rs_ale.js + + + + +http://mlb.mlb.com/components/game/year_2007/month_04/day_15/standings_rs_alc.js + + + + +http://mlb.mlb.com/components/game/year_2007/month_04/day_15/standings_rs_alw.js + + + + + + +// National League (NL) + + + +http://mlb.mlb.com/components/game/year_2007/month_04/day_15/standings_rs_nle.js + + + + +http://mlb.mlb.com/components/game/year_2007/month_04/day_15/standings_rs_nlc.js + + + + +http://mlb.mlb.com/components/game/year_2007/month_04/day_15/standings_rs_nlw.js + + +Preprocessing: Parsing the Win/Loss files (Parse & Filter) + +Entering the first URL for standings_rs_ale.js into a browser will display the JavaScript source file that creates the standings_rs_ale array: + + +var standings_rs_ale = [{ + + + +w: '6', + + + +elim: '-', + + + +rs: '51', + + + +div: 'ale', + + + +gameid: '2007_04_16_anamlb_bosmlb_1', + + + +status: 'F', + + + +pre: null, + + + +last10: '6-4', + + + +onerun: '1-0', + + + +xtr: '0-0', + + + +nextg: '4/16 v LAA, W 7-2', + + + +vsW: '4-3', + + + +ra: '28', + + + +gb: '-', + + + +wrap: '/NASApp/mlb/news/wrap.jsp?ymd=20070414&content_id=1898390&vkey=wrapup2005&fext=.jsp&c_id=mlb', + + + +home: '3-1', + + + +code: 'bos', + + + +pct: '.600', + + + +league_sensitive_team_name: 'Boston', + + + +vsC: '2-1', + + + +vsE: '0-0', + + + +vsR: '5-4', + + + +vsL: '1-0', + + + +xwl: '7-3', + + + +strk: 'W2', + + + +l: '4', + + + +lastg: '4/14 v LAA, W 8-0', + + + +interleague: '0-0', + + + +team: 'Boston', + + + +road: '3-3' + + +}, { + + +Web developers might recognize this as JavaScript Object Notation (JSON) syntax. We won't get into the specifics of JSON here, see the Parse chapter for more information about how it works. + + +This is the content for the first team, and four additional blocks like this one follow. Only a few pieces of information are needed from this file. Most useful will be the two or three digit code used to identify the team (because this will later be used to index other kinds of data) from the line that reads: + + +code: 'bos', + + +Next the line for wins: + + +w: '6', + + +and for losses: + + +l: '4', + + +We will also want a team name to show in the interface, and luckily there is a variable named team that looks like it might do the trick. However, it lists New York as the value for the New York Yankees, which won't be useful when trying to differentiate the Yankees from the Mets, who also hail from New York. Instead, the league_sensitive_team_name value will be more useful. For instance, the entry for the Mets reads: + + +league_sensitive_team_name: 'NY Mets', + + +Lines between two teams' data begin with a { character, so each time that character is found, the new information can be added to the list for that team. Grabbing the data for all of the teams is simply a matter of parsing this information properly. The following code reads one of the files, and parses the data into + + +Note: It would also be possible to use a proper JSON parser to read the data, but because the data shown here is so simple, using the parser would be overkill, making the program run more slowly, and increasing its download size. + + +Introducing Regular Expressions + + +The following function will read from one of these .js files and print each team code that it finds, followed by the win-loss record for that team. The code introduces regular expressions, which are extremely useful when parsing data. + + +void parseWinLoss(String[] lines) { + + + +Pattern p = Pattern.compile("\\s+([\\w\\d]+):\\s'(.*)',?"); + + + + + + + +String teamCode = ""; + + + +int wins = 0; + + + +int losses = 0; + + + + + + +for (int i = 0; i < lines.length; i++) { + + + +Matcher m = p.matcher(lines[i]); + + + + + + +if (m.matches()) { + + + +String attr = m.group(1); + + + +String value = m.group(2); + + + + + + +if (attr.equals("code")) { + + + +teamCode = value; + + + +} else if (attr.equals("w")) { + + + +wins = int(value); + + + +} else if (attr.equals("l")) { + + + +losses = int(value); + + + +} + + + + + + +} else { + + + +if (lines[i].startsWith("}")) { + + + +// This is the end of a group, print the values + + + +println(teamCode + " " + wins + "-" + losses); + + + +} + + + +} + + + +} + + +} + + +Looking at the original data, the basic format of a line is as follows: + + +[space] [attribute name] : [space] ' [value] ' , + + +This sort of template is common when parsing data, and can be handled with a regular expression (or regexp). A regexp is defined by a pattern, such as the one above, and a matcher, which checks the pattern against some input data. A pattern is made up of a series of symbols that identify white space, characters, numbers and how many of each are expected. The symbols are initially a confusing mess, but after some time they will become familiar (they'll be less confusing, even if they still look like a mess). The pattern \\s+([\\w\\d]+):\\s'(.*)',? used above identifies the following: + + +\\s+ – This part matches the beginning of the line. The \s pattern means "any whitespace." Because \ is used to identify special characters in a String (i.e. \t for TAB or \n for newline), an actual slash is specified by a double slash: \\. The + symbol at the end means to look for one or more characters. + + +([\\w\\d]) – This portion matches the name of the variable (such as w or team). The \w pattern specifies any word character (i.e. letters). The \d pattern specifies digits (0-9). Brackets enclose a set of possible characters, so in this case [\\w\\d] means "any word or digit character." The + at the end specifies "one or more." The entire grouping inside parentheses means to mark that set of characters as a group. This means that the matching characters can later be extracted. + + +: – This is literally just the colon character, found after the variable name. + + +\\s – Match a single whitespace character, the space after the colon. + + +' – Matches the single quote at the beginning of the variable's value. + + +(.*) – This part matches the value found inside the single quotes. The . operator matches "anything." Any character is possible. The * pattern specifies zero or more of the operator that precedes it (similar to how the + operator matches one or more). The parentheses mark this as the second grouping to be retrieved later. + + +' – The closing single quote after the variable's value. + + +,? – Matches the optional comma at the end of the line. Similar to + and *, the ? modifier specifies "zero or one" matches. + + +To use a regexp, first create a Pattern object, as seen in the first line of the method. Next we will iterate through each line of the input data and attempt to match it to the Pattern. Inside the loop, the Matcher object handles testing the pattern. The matches() method returns true if the specified lines[i] value fits the pattern. Next, the group() method is used to retrieve each group (which correlate to the information found inside parentheses in the pattern). The first group is the attribute (or variable name) and the second group is the value (the variable contents). + + +If the line does not match, the final part of the method checks whether the line begins with a {{}}}, which specifies a break between data from two teams, at which point the values collected so far are printed to the console with println. + + +A complete program to acquire and parse the data for all six divisions from MLB.com follows. It creates two text files, one that contains the standings, and a second for the team codes and the team names. + + +import java.util.regex.*; + + + + + +PrintWriter standings; + + +PrintWriter teams; + + + + + + + + +void setup() { + + + +String base = "http://mlb.mlb.com/components/game" + + + + +"/year_2007/month_04/day_15/"; + + + + + + + +standings = createWriter("standings.tsv"); + + + +teams = createWriter("teams.tsv"); + + + + + + + +parseWinLoss(loadStrings(base + "standings_rs_ale.js")); + + + +parseWinLoss(loadStrings(base + "standings_rs_alw.js")); + + + +parseWinLoss(loadStrings(base + "standings_rs_alc.js")); + + + + + + +parseWinLoss(loadStrings(base + "standings_rs_nle.js")); + + + +parseWinLoss(loadStrings(base + "standings_rs_nlw.js")); + + + +parseWinLoss(loadStrings(base + "standings_rs_nlc.js")); + + + + + + + +// Finish writing and close each file. + + + +standings.flush(); + + + +standings.close(); + + + +teams.flush(); + + + +teams.close(); + + + + + + + +println("Done."); + + +} + + + + + + + + +void parseWinLoss(String[] lines) { + + + +Pattern p = Pattern.compile("\\s+([\\w\\d]+):\\s'(.*)',?"); + + + + + + + +String teamCode = ""; + + + +int wins = 0; + + + +int losses = 0; + + + +String teamName = ""; + + + + + + +for (int i = 0; i < lines.length; i++) { + + + +Matcher m = p.matcher(lines[i]); + + + + + + +if (m.matches()) { + + + +String attr = m.group(1); + + + +String value = m.group(2); + + + + + + +if (attr.equals("code")) { + + + +teamCode = value; + + + +} else if (attr.equals("w")) { + + + +wins = int(value); + + + +} else if (attr.equals("l")) { + + + +losses = int(value); + + + +} else if (attr.equals("league_sensitive_team_name")) { + + + +teamName = value; + + + +} + + + + + + + +} else { + + + +if (lines[i].startsWith("}")) { + + + +// This is the end of a group, print the values + + + +standings.println(teamCode + TAB + wins + TAB + losses); + + + +teams.println(teamCode + TAB + teamName); + + + +} + + + +} + + + +} + + +} + + + + + +The resulting standings.tsv file reads: + + +bos 6 4 + + +tor 7 5 + + +bal 6 6 + + +nyy 5 6 + + +tb 5 7 + + +sea 5 3 + + +ana 6 6 + + +oak 6 7 + + +tex 5 7 + + +cle 6 3 + + +det 7 5 + + +min 7 5 + + +cws 5 6 + + +kc 3 9 + + +atl 8 3 + + +nym 7 4 + + +fla 6 5 + + +phi 3 8 + + +was 3 9 + + +ari 9 4 + + +la 8 4 + + +sd 7 5 + + +col 5 7 + + +sf 3 7 + + +cin 7 5 + + +mil 6 5 + + +stl 6 5 + + +hou 4 6 + + +pit 4 6 + + +chc 4 7 + + +And the teams.tsv file contains: + + +bos Boston + + +tor Toronto + + +bal Baltimore + + +nyy NY Yankees + + +tb Tampa Bay + + +sea Seattle + + +ana LA Angels + + +oak Oakland + + +tex Texas + + +cle Cleveland + + +det Detroit + + +min Minnesota + + +cws Chi White Sox + + +kc Kansas City + + +atl Atlanta + + +nym NY Mets + + +fla Florida + + +phi Philadelphia + + +was Washington + + +ari Arizona + + +la LA Dodgers + + +sd San Diego + + +col Colorado + + +sf San Francisco + + +cin Cincinnati + + +mil Milwaukee + + +stl St. Louis + + +hou Houston + + +pit Pittsburgh + + +chc Chi Cubs + + +The team names file can be downloaded here: + + + +http://benfry.com/book/salaryper/teams.tsv + + + +along with the example standings file: + + + +http://benfry.com/book/salaryper/standings.tsv + + + +The code downloads each file for April 15, 2007, but changing to another date is a simple matter. To use the current date, use a combination of the year(), month(), and day() methods along with nf() to pad the numbers to the proper number of digits: + + +String base = "http://mlb.mlb.com/components/game" + + + + +"/year_" + nf(year(), 4) + + + + +"/month_" + nf(month(), 2) + + + + +"/day_" + nf(day(), 2) + "/"; + +Preprocessing: Acquiring Team Logos (Acquire, Refine) + +Plain text for the team names is not particularly appealing, it's much more temping to use the actual team logos because they're so closely associated with each team. Finding team logos on the MLB site (or any other site, for that matter) illustrates another bit of useful detective work, in this case trying to determine the pattern for a series of image files. + + +The first thing to do is to find a possible logo image. For instance, the scoreboard page at http://mlb.mlb.com/mlb/scoreboard has logos for several of the teams. To determine their location, right-click one of the images, and select Copy Image Location (or its equivalent in whatever web browser you are using), and use that location to open a new page. Right-clicking on the Chicago Cubs image, for instance, produced this URL: + + + +http://mlb.mlb.com/mlb/images/team_logos/logo_chc_small.gif + + + +The chc is the three letter team code found earlier when downloading team data, which suggests that logos for the remaining 29 teams can be found by replacing those three letters for each team. The list of codes is one column of the teams.tsv file created in the previous step. In a new sketch, enter the team codes as part of a String array. Put quotes around each to specify that they are String objects, and commas between each to set apart the list. The following syntax shows how to create a String array already populated information (rather than loading it from a file). + + +String[] teams = { + + + +"ana", "ari", "atl", "bal", "bos", "chc", "cin", "cle", + + + +"col", "cws", "det", "fla", "hou", "kc", "la", "mil", + + + +"min", "nym", "nyy", "oak", "phi", "pit", "sd", + + + +"sea", "sf", "stl", "tb", "tex", "tor", "was" + + +}; + + +The fact that the file name includes _small in the title suggests that there are images of other sizes. A first thing to try suffixes like _large or _medium, though neither work in this situation. It may be possible to even look at the directory that contains the logos (http://mlb.mlb.com/mlb/images/team_logos/) and get a file listing, but this generally only works for smaller (or less professional) web sites. + + +Of course, the locations for the images are subject to change at any time (and often will), which is why we are spending time to go through the process of figuring out the image locations. + + +The next alternative to finding other images (aside from digging around the site for images in other shapes and sizes) is to use a search engine. Do a search for the first part of the URL and see what sort of results turn up. Doing a search for mlb/images/team_logos/ reveals several additional possibilities: + + + +http://mlb.mlb.com/mlb/images/team_logos/logo_atl_small.gif + + + + +http://mlb.mlb.com/mlb/images/team_logos/50x50/atl.gif + + + + +http://mlb.mlb.com/mlb/images/team_logos/logo_bal_79x76.jpg + + + + +http://mlb.mlb.com/mlb/images/team_logos/51x21/bos_standings_logo.gif + + + +A fifth on another site shows yet another format: + + + +http://losangeles.angels.mlb.com/mlb/images/team_logos/100x100/ana.gif + + + +Though the similarities in the directory structure suggest that the site is merely an alias, and a quick test confirms that the following works in an identical manner: + + + +http://mlb.mlb.com/mlb/images/team_logos/100x100/ana.gif + + + +For each of the URLs in question, the team code is used between a prefix and suffix specific to the image size and location. In the case of the small logos, the prefix is the following: + + + +http://mlb.mlb.com/mlb/images/team_logos/logo_ + + + +followed by the two or three digit team code, and the suffix: + + +_small.gif + + +With all this in mind, a short program can be used to download each set of images: + + +String[] teams = { + + + +"ana", "ari", "atl", "bal", "bos", "chc", "cin", "cle", + + + +"col", "cws", "det", "fla", "hou", "kc", "la", "mil", + + + +"min", "nym", "nyy", "oak", "phi", "pit", "sd", + + + +"sea", "sf", "stl", "tb", "tex", "tor", "was" + + +}; + + + + + +void setup() { + + + +grabLogos("small", "http://mlb.mlb.com/mlb/images/team_logos/logo_", "_small.gif"); + + + +grabLogos("50x50", "http://mlb.mlb.com/mlb/images/team_logos/50x50/", ".gif"); + + + +grabLogos("79x76", "http://mlb.mlb.com/mlb/images/team_logos/logo_", "_79x76.jpg"); + + + +grabLogos("standings", "http://mlb.mlb.com/mlb/images/team_logos/51x21/", "_standings_logo.gif"); + + + +grabLogos("100x100", "http://mlb.mlb.com/mlb/images/team_logos/100x100/", ".gif"); + + +} + + + + + +void grabLogos(String folder, String prefix, String suffix) { + + + +String extension = suffix.substring(suffix.length() - 4); + + + +for (int i = 0; i < teams.length; i++) { + + + +String filename = folder + "/" + teams[i] + extension; + + + +String url = prefix + teams[i] + suffix; + + + +println("Downloading " + url); + + + +saveStream(filename, url); + + + +} + + +} + + +The teams array contains the list of the 30 team codes. The grabLogos() method iterates through each team, downloading images based on the specified prefix and suffix. The saveStream() method handles loading the data available at a particular web address and writing it back to the disk (it's equivalent to using the built-in function loadBytes(), followed by saveBytes()). Because the image may be a .jpg or .gif file, the grabLogos() method uses substring() on the source file name to determine the extension to use when naming the downloaded file. + + +In the end, the small directory contains the most promising images (in terms of size and proportion). Start a new sketch, and use Sketch -> Show Sketch Folder. to add these to the data folder. + +Preprocessing: Acquiring and Parsing Salary Data (Acquire, Parse, Filter) + +The next step is to find a list of the payroll for each of the teams. There appears to be no such feature on MLB.com, but the USA Today web site makes available a list of team payrolls here: + + + +http://usatoday.com/sports/baseball/salaries/totalpayroll.aspx?year=2007 + + + +The simplest method to get this information is to copy from your web browser and paste into an open document in your spreadsheet application of choice. If you're lucky, the table will be interpreted as tab delimited, so the columns will be preserved when pasting into the spreadsheet. + + +Another option is to use the import or link feature of your spreadsheet application. For instance, if using OpenOffice.org, create a new Calc document, and choose Insert -> Link To External Data... Paste the URL found above into the first text field that reads "URL of external data source." Pressing the enter (or return) key will populate the list of "Available tables/ranges." Scroll down and select HTML__BBSalTable from the list. + + +Figure 02 + + +Click OK to import the data. This will pick up more of the web page than necessary, but scrolling down to the 17th row and expanding columns A and B shows the list of teams: + + +Figure 03 + + +Delete all rows and columns except for the team name and the salary, and replace each team name with its two or three letter code. The commas and dollar signs will also need to be removed (a quick Find & Replace will take care of these). Finally, save the file as plain text in TSV format as salaries.tsv. A completed version of the salaries file can be found here: + + + +http://benfry.com/book/salaryper/salaries.tsv + + + +Of course, parsing the page and downloading the table could be handled in code, but the amount of information (30 team salaries) and the frequency at which it's updated (once a year) does not warrant an algorithmic solution. + + +[BOX] As a rule of thumb, I will only write code in cases where the time to write the code is less than (or equal to) double the amount of time it takes to do the process by hand. That is, if it takes three hours to do it by hand, and I can implement it in code in six hours or less, then code is preferred so that it can be easily updated. As a corollary, however, in situations like this one, the page structure will likely change more than the data itself. In such cases, writing a parser is usually a waste of time. + +Starting the actual program (Acquire, Parse, Filter, Mine) + +In the previous steps, we've managed to download files that represent the team names and logos, their salaries, and their standings on a given day. Having determined how to handle each type of information, and preprocessed parts of the information, we'll next pull it together into a single application. + + +Team Names and Codes + + +We'll first load the team names using a method named setupTeams(). This is similar to other examples where loadStrings() is used, followed by split() to break a line into individual columns. + + +int teamCount = 30; + + +String[] teamNames; + + +String[] teamCodes; + + +HashMap teamIndices; + + + + + +void setupTeams() { + + + +String[] lines = loadStrings("teams.tsv"); + + + + + + + +teamCount = lines.length; + + + +teamCodes = new String[teamCount]; + + + +teamNames = new String[teamCount]; + + + +teamIndices = new HashMap(); + + + + + + + +for (int i = 0; i < teamCount; i++) { + + + +String[] pieces = split(lines[i], TAB); + + + +teamCodes[i] = pieces[0]; + + + +teamNames[i] = pieces[1]; + + + +teamIndices.put(teamCodes[i], new Integer(i)); + + + +} + + +} + + + + + +int teamIndex(String teamCode) { + + + +Integer index = (Integer) teamIndices.get(teamCode); + + + +return index.intValue(); + + +} + + +Most important for the teamCodes and teamNames arrays is that they provide an ordering that can be used to anchor the data. That is, when the salary information is loaded, there is no way to know the exact order in which the teams will be found. The same is true for the win-loss standings, which will change from day to day. By mapping the teamCode to a particular team index (numbered 0 to 29), we can always ensure that data from each source is connected properly. + + +To map a teamCode to an integer index, a HashMap is used. The HashMap class like a dictionary that connects two pieces of data, each an Object. The put() method adds a new entry to the map, while the get() method retrieves it. Because only objects can be used in HashMaps, it's necessary to wrap the int for the team's index in an Integer object, which was created for this purpose. The intValue() method extracts the original int from the Integer object. This is encapsulated by the teamIndex function, so that we don't have to think about HashMaps or Integers when writing the rest of the code. + + +[BOX] What's the difference between int and Integer? An int is a primitive type (like float and char), which contains an actual literal value, as opposed to an object, which is a reference to a block of related values. The distinction can be confusing, and often leads people to ask why not make everything an object? The answer is that objects create significant (and unnecessary) overhead for the purposes of how primitive types like int are used. For example, in a for loop with thousands of iterations, it would be silly to de-reference an Integer object used for the counter on each iteration. Because the int refers to specific value, only one step is required to read or change it. The object refers to a location in memory, so the first step would be to check whether the location was valid. The number might be stored in a variable called value, so once the location in memory was determined to be correct, a check would be made to find the location of the value variable (and whether or not it existed). Then, the variable itself could be manipulated in some manner. While it may not sound like much, this sort of thing really makes a difference when dealing with thousands of values. Scripting languages often use objects for all values, which can contribute to their lack of speed. Especially in cases of languages that are not "typed," that each piece of data must first be converted as it is used. That is, everything might be stored as string values, and then converted to an integer in any context where the value is used as an integer (such as counting in our for loop). The process can be even more time consuming. + + +Team Salaries + + +The salary data will be a list of ranked values, just like the team standings information. The parameters for ranked data will be: + + +A list of the values to be ranked (the amount of each teams' payroll) + + +A list of how those values will be shown to the user (the number formatted as a dollar amount, with commas: $34,140,182). + + +A list of the rank for each item, and a sorting order to be used when ranking. For instance, a higher payroll amount has a negative connotation, whereas a higher win-loss average has a positive connotation. In some cases having the data in ascending order might be more useful, others will be the opposite. + + +A means of keeping track of the highest and lowest values. + + +The ranked list is useful for salary data as well as the win-loss standings. It will also be useful when adapting this project to other types of data. Because it also requires a little bit of code to sort the information and calculate its minimum and maximum values, the RankedList class was created to provide a general purpose means of handling ranked data. Download this class from the book site and add it to your sketch: + + + +http://benfry.com/book/salaryper/RankedList.java + + + +The parameters described above are stored in the value, title, and rank arrays. To use the class for salary data, one need only extend the class. To do this, create a SalaryList class in a new tab. Its only contents are: + + +class SalaryList extends RankedList { + + + + + + + +SalaryList(String[] lines) { + + + +super(teamCount, false); + + + + + + + +for (int i = 0; i < teamCount; i++) { + + + +String pieces[] = split(lines[i], TAB); + + + + + + + +// First column is the team 2-3 digit team code. + + + +int index = teamIndex(pieces[0]); + + + + + + + +// Second column is the salary as a number. + + + +value[index] = parseInt(pieces[1]); + + + + + + + +// Make the title in the format $NN,NNN,NNN + + + +int salary = (int) value[index]; + + + +title[index] = "$" + nfc(salary); + + + +} + + + +update(); + + + +} + + +} + + +The SalaryList method is the constructor for the class, it controls how an object is initialized. The super method calls the constructor of the parent class (known as the superclass). In this case, it runs RankedList(teamCount, false) to create a list with 30 entries in descending order (the false specifes not in ascending order). + + +The rest of the code is like our other parsing functions, except that it fills the value, title, and rank array for the values read from an array of Strings loaded from a file. The title variable for each is set to a dollar sign followed by the payroll number with commas inserted by the nfc() method. + + +After parsing the information, the update() method calls a function inside RankedList that takes care of sorting the data and calculating the minimum and maximum values. + + +Back in the main tab, the setupSalaries() method creates the SalaryList. + + +SalaryList salaries; + + + + + +void setupSalaries() { + + + +String[] lines = loadStrings("salaries.tsv"); + + + +salaries = new SalaryList(lines); + + +} + + +Win-Loss Standings + + +The win-loss record is handled in a similar fashion. First, a modified version of our pre-processing code handles acquiring and parsing the standings data for a given day: + + +String[] acquireStandings(int year, int month, int day) { + + + +String filename = year + nf(month, 2) + nf(day, 2) + ".tsv"; + + + +String path = dataPath(filename); + + + +File file = new File(path); + + + +if (!file.exists() || (file.length() == 0)) { + + + +println("Downloading standings file " + filename); + + + +PrintWriter writer = createWriter(path); + + + + + + +String base = "http://mlb.mlb.com/components/game" + + + + +"/year_" + year + "/month_" + nf(month, 2) + "/day_" + nf(day, 2) + "/"; + + + + + + +// American League (AL) + + + +parseStandings(base + "standings_rs_ale.js", writer); + + + +parseStandings(base + "standings_rs_alc.js", writer); + + + +parseStandings(base + "standings_rs_alw.js", writer); + + + + + + +// National League (NL) + + + +parseStandings(base + "standings_rs_nle.js", writer); + + + +parseStandings(base + "standings_rs_nlc.js", writer); + + + +parseStandings(base + "standings_rs_nlw.js", writer); + + + + + + +writer.flush(); + + + +writer.close(); + + + +} + + + +return loadStrings(filename); + + +} + + + + + +void parseStandings(String filename, PrintWriter writer) { + + + +String[] lines = loadStrings(filename); + + + +Pattern p = Pattern.compile("\\s+([\\w\\d]+):\\s'(.*)',?"); + + + + + + +String teamCode = ""; + + + +int wins = 0; + + + +int losses = 0; + + + + + + +for (int i = 0; i < lines.length; i++) { + + + +Matcher m = p.matcher(lines[i]); + + + + + + +if (m.matches()) { + + + +String attr = m.group(1); + + + +String value = m.group(2); + + + + + + +if (attr.equals("code")) { + + + +teamCode = value; + + + +} else if (attr.equals("w")) { + + + +wins = parseInt(value); + + + +} else if (attr.equals("l")) { + + + +losses = parseInt(value); + + + +} + + + + + + +} else { + + + +if (lines[i].startsWith("}")) { + + + +// This is the end of a group, write these values + + + +writer.println(teamCode + TAB + wins + TAB + losses); + + + +} + + + +} + + + +} + + +} + + +For data from May 2, 2007, the acquireStandings() method looks for a file named 20070502.tsv. If the file is not present, it downloads the data from MLB.com and parses it to create a filtered version that contains only the team code followed by the number of wins and then the number of losses. + + +This code is nearly identical to the standalone version discussed earlier in the pre-processing steps. One difference is the use of a File object and the dataPath() method. The dataPath() method gives a full path name to a file found in the data directory. This is useful when interfacing between Processing and Java file methods, because Java has no concept of the data folder. The File class is used in Java to store a reference to a particular file (or directory), and includes several useful methods like exists(), which we use here to determine whether the file is available or not. Here we also check to see if the file's length (size) is zero, which can happen if the acquireStandings method is interrupted, and the file is not completely written. + + +In a new tab called StandingsList, write a similar piece of code to the constructor for SalaryList. + + +class StandingsList extends RankedList { + + + + + + + +StandingsList(String[] lines) { + + + +super(teamCount, false); + + + + + + + +for (int i = 0; i < teamCount; i++) { + + + +String[] pieces = split(lines[i], TAB); + + + +int index = teamIndex(pieces[0]); + + + +int wins = parseInt(pieces[1]); + + + +int losses = parseInt(pieces[2]); + + + + + + + +value[index] = (float) wins / (float) (wins+losses); + + + +title[index] = wins + "-" + losses; + + + +} + + + +update(); + + + +} + + +} + + +And back in the main tab, loading the data works similarly, where standings information is acquired based on the current day. + + +StandingsList standings; + + + + + +void setupStandings() { + + + +String[] lines = acquireStandings(year(), month(), day()); + + + +standings = new StandingsList(lines); + + +} + + +Team Logos + + +All that remains now is to load the logo images for each team. These were downloaded earlier in the preprocessing step into a folder named small. Add this folder to the data folder of your sketch, and the following code to your program: + + +PImage[] logos; + + +float logoWidth; + + +float logoHeight; + + + + + +void setupLogos() { + + + +logos = new PImage[teamCount]; + + + +for (int i = 0; i < teamCount; i++) { + + + +logos[i] = loadImage("small/" + teamCodes[i] + ".gif"); + + + +} + + + +logoWidth = logos[0].width / 2.0; + + + +logoHeight = logos[0].height / 2.0; + + +} + + +This gets into how the logos are represented as well. Each logo is 38 pixels wide and 45 pixels tall. Some quick math will tell us that 45 pixels times 30 teams (1350 pixels) will not fit on the screen, or at least is unnecessarily large. However, half that height is just perfect for a 1024x768 display. Because the size of the logo images might change over the years (or with a different data set), the logoWidth and logoHeight variables are determined by using half the size of the first logo that is loaded. + + +Finishing setup() + + +The setup() method brings all this together and also sets up a font to use for showing the data. We'll begin with just the generic SansSerif font, but that will change later. + + +PFont font; + + + + + +void setup() { + + + +size(480, 750); + + + + + + +setupTeams(); + + + +setupSalaries(); + + + +setupStandings(); + + + +setupLogos(); + + + + + + +font = createFont("SansSerif", 11); + + + +textFont(font); + + +} + + +Later, we'll use salary as a tie-breaker when sorting the standings, so setupStandings() should be after setupSalaries(). + +Represent + +Bringing all this together, we can begin a simple representation to show each row of data with the team name, win-loss record, logo, and salary. We begin with a few constants, variables prefixed with static final because they will not change while the sketch is in use. Because the logo height is 22.5 pixels, we'll make each row 23 pixels tall. We'll want to center everything from the middle of the row, so the HALF_ROW_HEIGHT variable will also come in handy. + + +static final int ROW_HEIGHT = 23; + + +static final float HALF_ROW_HEIGHT = ROW_HEIGHT / 2.0; + + +static final int SIDE_PADDING = 30; + + +The text size set earlier is about half the height of each row. This makes easy-to-read double spaced text. The text itself needn't be particularly large or prominent, because it is not as important as the correlation line itself. + + +The SIDE_PADDING variable is used to set a border around the display, adding some white space to the edges. The amount should be more than the row height, so that it looks intentional, but not too large as to waste space. + + +The draw method reads as follows: + + +void draw() { + + + +background(255); + + + +smooth(); + + + + + + +translate(SIDE_PADDING, SIDE_PADDING); + + + + + + +float leftX = 160; + + + +float rightX = 335; + + + + + + + +textAlign(LEFT, CENTER); + + + + + + + +for (int i = 0; i < teamCount; i++) { + + + +fill(0); + + + +float standingsY = standings.getRank(i)*ROW_HEIGHT + HALF_ROW_HEIGHT; + + + +image(logos[i], 0, standingsY - logoHeight/2, logoWidth, logoHeight); + + + +text(teamNames[i], 28, standingsY); + + + +text(standings.getTitle(i), 115, standingsY); + + + + + + +float salaryY = salaries.getRank(i)*ROW_HEIGHT + HALF_ROW_HEIGHT; + + + + + + +stroke(0); + + + +line(leftX, standingsY, rightX, salaryY); + + + + + + +text(salaries.getTitle(i), rightX+10, salaryY); + + + +} + + +} + + +The translate() method moves the coordinate system over slightly, giving us a white border: (0, 0) will now be (30, 30), so nothing will be drawn in the left and right 30 pixels of the image. + + +The leftX and rightX values could also be constants (i.e. static final int LEFT_X = 160;), but we'll leave them as variables in case we later want to dynamically figure out the position of each column. A better idea than the current implementation (where the X-coordinates were determined by trial and error) would be to base the positions on the maximum width of each column of text plus a little extra padding. + + +The textAlign() method is used to left-align, and vertically center the each row of text. + + +A loop travels iterates through each team index (represented by i). The text() method draws the team name, aligned to the left, and then the standings value (i.e. 40-29) centered next to it. + + +The standingsY and salaryY variables are calculated by the rank of the given team, multiplied by the row height, plus HALF_ROW_HEIGHT so that the line shows up in the center. + + +The resulting image looks like this: + + +Figure 04 + +Refine + +When reaching the refinement stage, always return to the original question. We're most concerned with how salaries relate to performance for each team. In the current image, the team logos are the most prominent visual elements (because they're color), while the lines (the most important feature) are about as informative as a pile of sticks. + + +Improving the lines + + +The first metric for the original question is whether teams are spending their money well. At its most basic, this is a yes or no question, so it will be important to highlight it as such with the representation. Teams spending their money well have a line that gets lower as it moves from left to right (connecting a high ranking in the standings to a low salary), whereas teams wasting money have lines that move upwards from left to right. By using a color for each scenario, we can highlight the answer to the boolean question of how well the team is performing. Color is a good choice in this case because we only need a pair of colors, and the detail being shown with the color is more important than any other feature in the diagram. To apply the colors, replace the stroke(0) line with: + + + +if (salaryY >= standingsY) { + + + +stroke(33, 85, 156); // Blue for positive (or equal) difference. + + + +} else { + + + +stroke(206, 0, 82); // Red for wasting money. + + + +} + + +Figure 05 + + +But even with colors, the lines still are not very clear because their thicknesses don't vary enough. Variation is an important queue used by our brains to help us differentiate between elements and determine what shapes are related to one another. + + +To introduce more variation into the lines, we can vary the stroke weight based on the team's salary. We could do the same thing with the record, but payroll is more intuitive, as it refers to "bigger" or "smaller" teams—we don't think of standings as big or small, but we do think about monetary amounts in these terms. + + +The variation is handled with the map() method, mapping the minimum salary to a very thin stroke (0.25) and the largest salary to a nice, thick line. Add this code before the line() statement to scale the line weights in proportion to each team's salary: + + + +float weight = map(salaries.getValue(i), + + + +salaries.getMinValue(), salaries.getMaxValue(), + + + +0.25, 6); + + + +strokeWeight(weight); + + +Figure 06 + + +The image is getting more readable than the original in Figure 4, but still more can be done. + + +A better typeface + + +Instead of the generic SansSerif font, a better option is Matthew Carter's Georgia. The size is also upped a notch to match the amount of space used by the original font: + + +font = createFont("Georgia", 12); + + +Carter designed the typeface for Microsoft in 1993 as part of their Web core fonts initiative, as Microsoft's typography group sought better screen fonts that could differentiate Windows and other Microsoft products from their competitors. The Web core fonts package was available as a free download, and is a default font on Windows systems, and installed along with Microsoft software (such as Office) on Mac OS X. This makes it reasonably safe to expect the font to be installed on other machines, rather than using the Create Font Tool. On Linux, the fonts are available from a SourceForge project that repackages the fonts for easy installation. This package is also available as part of some Linux distributions. + + + +http://sourceforge.net/projects/corefonts/ + + + + +http://sourceforge.net/project/showfiles.php?group_id=34153 + + + +The font is a good option because it has elegant non-lining numerals, which will make the number-rich display a little more appealing. Also called old style figures, the digits will above and below the baseline. The disadvantage in this case is that the numbers won't be identical widths, making them more difficult to compare against one another. Usually this is one helpful indicator when dealing with type—right aligning a series of numbers makes their magnitude obvious at a quick glance. In this piece, however, the exact numbers (whether the Yankees are being paid $189,639,045 or $189,638,042 is not an important distinction) are less important because the numbers are already shown in rank order (the most important axis), so we can sacrifice a little bit of the readability. + + +The text still carries too much visual weight, so it needs to be faded a bit. Replacing the fill(0) statement with fill(128) will make the text gray, and helps balance the text with the colored lines, appropriately returning the greatest visual importance to the lines themselves. + + +Taken together, the new version of the draw() method follows, with altered portions marked in bold. + + +void draw() { + + + +background(255); + + + +smooth(); + + + + + + +translate(SIDE_PADDING, SIDE_PADDING); + + + + + + +float leftX = 160; + + + +float rightX = 335; + + + + + + + +for (int i = 0; i < teamCount; i++) { + + + +fill(128); + + + +float standingsY = standings.getRank(i)*ROW_HEIGHT + HALF_ROW_HEIGHT; + + + +image(logos[i], 0, standingsY - logoHeight/2, logoWidth, logoHeight); + + + +textAlign(LEFT, CENTER); + + + +text(teamNames[i], 28, standingsY); + + + +textAlign(RIGHT, CENTER); + + + +text(standings.getTitle(i), leftX-10, standingsY); + + + + + + +float salaryY = salaries.getRank(i)*ROW_HEIGHT + HALF_ROW_HEIGHT; + + + +if (salaryY >= standingsY) { + + + +stroke(33, 85, 156); // Blue for positive (or equal) difference. + + + +} else { + + + +stroke(206, 0, 82); // Red for wasting money. + + + +} + + + +float weight = map(salaries.getValue(i), + + + +salaries.getMinValue(), salaries.getMaxValue(), + + + +0.25, 6); + + + +strokeWeight(weight); + + + + + + + +line(leftX, standingsY, rightX, salaryY); + + + + + + +fill(128); + + + +textAlign(LEFT, CENTER); + + + +text(salaries.getTitle(i), rightX+10, salaryY); + + + +} + + +} + + +Figure 07 + + +A word about dashes and numbers + + +The dash used between the win-loss record looks a little wimpy because dashes are so small. A better solution is to use the en dash character by changing this line from the StandingsList constructor: + + + +title[index] = wins + "-" + losses; + + +to read as follows: + + + +title[index] = wins + "\u2013" + losses; + + +Robert Bringhurst's The Elements of Typographical Style defines the en dash a suitable for use when separating values that can be broken with the word "to." In this case, the 40-21 next to the Red Sox can be stated as "the Red Sox have a record of 40 to 21," making the en dash suitable for this situation. More about the Bringhurst text can be found in the Refine chapter. + + +The en dash is specified by "\u2013", which is a Unicode escape sequence. A Unicode escape is a \u followed by four hex digits for the character's number in the Unicode character set. More about Unicode can be found in the Parse chapter. Other types of dashes can be used, such as the em dash, "\u2012", or the minus sign, "\u2212". Using the en dash also has the benefit of ensuring that the vertical position of the dash will align nicely with the horizontal bars of the numbers that it separates. + + +Using salary as a tie-breaker + + +Another alteration to the StandingsList is to improve how ties are handled. When two teams have an identical record (not an uncommon occurrence, especially early in the season), the tie should go to the team with the lower salary. + + +Inside RankedList, sorting is handled based on a function that compares two elements in the list. This is common for most sorting algorithms, where a comparison function is made to return zero if the items are identical, and a negative or positive number to indicate whether one value is greater or less than the second value specified. + + +Writing a new compare() method lets us specify a more sophisticated sort. In the modified method, the compare() method of the superclass (RankedList) is called first. If the comparison is a value besides zero, then that means the items are not identical, and the original value can be used. But if the values are identical, the comparison function from the salaries object is used. Because values for a and b refer to the same team in both the standings and salaries (they were ordered using the teamIndex() function as they were loaded), the comparison works. + + +class StandingsList extends RankedList { + + + + + + + +StandingsList(String[] lines) { + + + +super(teamCount, false); + + + + + + + +for (int i = 0; i < teamCount; i++) { + + + +String[] pieces = split(lines[i], TAB); + + + +int index = teamIndex(pieces[0]); + + + +int wins = parseInt(pieces[1]); + + + +int losses = parseInt(pieces[2]); + + + + + + + +value[index] = (float) wins / (float) (wins+losses); + + + +title[index] = wins + "\u2013" + losses; + + + +} + + + +update(); + + + +} + + + + + + +float compare(int a, int b) { + + + +// First compare based on the record of both teams + + + +float amt = super.compare(a, b); + + + +// If the record is not identical, return the difference + + + +if (amt != 0) return amt; + + + + + + +// If records are equal, use salary as tie-breaker. + + + +// In this case, a and b are switched, because a higher + + + +// salary is a negative thing, unlike the values above. + + + +return salaries.compare(a, b); + + + +} + + +} + +Moving to multiple days (Interact) + +So far we've covered a lot of data parsing and some visual refinement. But the static image is unfulfilling—the season changes from day to day, as teams improve, tank, and go on winning streaks. The code used to parse the information for a particular day can easily be adapted to other days, so long as we have a means for iterating through days of the season, and knowing which days to use. + + +Dates and time are trickier than you might think. An initial temptation is to simply make an array of numbers for the days in each month. But what happens in a leap year? Do you use a different version of your code? The Java API contains a Date object that can convert between a long value (like an int, but can store far larger numbers) and a formatted date, which is handy. A companion class, SimpleDateFormat, can parse a date from a String object given a template, or convert from a Date object to a date formatted using the same template. + + +The long value of a date is the number of milliseconds elapsed since that date/time and January 1, 1970 (known as the "Unix epoch" or "POSIX time"). Given a starting value, moving to the next day is a matter of increasing the variable by the number of milliseconds in a day. Doing this in a loop will generate all the days of an entire season. + + +The code that follows takes as input a date stamp for the first day of the season (firstDateStamp) in the format YYYYMMDD, and the same for the final day of the season. Because no data is available past the current day, the maximum date for which information can be downloaded is today. However, results for the current day will always be incomplete, so it's best to only get results up to the previous day, this will be encapsulated in maxDateIndex. + + + +String firstDateStamp = "20070401"; + + + +String lastDateStamp = "20070930"; + + + +String todayDateStamp; + + + + + + +static final long MILLIS_PER_DAY = 24 * 60 * 60 * 1000; + + + + + + +// The number of days in the entire season. + + + +int dateCount; + + + +// The current date being shown. + + + +int dateIndex; + + + +// Don't show the first 10 days, they're too erratic. + + + +int minDateIndex = 10; + + + +// The last day of the season, or yesterday, if the season is ongoing. + + + +// This is the maximum date that can be viewed. + + + +int maxDateIndex; + + + + + + +// This format makes "20070704" from the date July 4, 2007. + + + +DateFormat stampFormat = new SimpleDateFormat("yyyyMMdd"); + + + +// This format makes "4 July 2007" from the same. + + + +DateFormat prettyFormat = new SimpleDateFormat("d MMMM yyyy"); + + + + + + +// All dates for the season formatted with stampFormat. + + + +String[] dateStamp; + + + +// All dates in the season formatted with prettyFormat. + + + +String[] datePretty; + + + + + + +void setupDates() { + + + +try { + + + +Date firstDate = stampFormat.parse(firstDateStamp); + + + +long firstDateMillis = firstDate.getTime(); + + + +Date lastDate = stampFormat.parse(lastDateStamp); + + + +long lastDateMillis = lastDate.getTime(); + + + + + + +// Calculate number of days by dividing the total milliseconds + + + +// between the first and last dates by the number of milliseconds per day + + + +dateCount = (int) + + + +((lastDateMillis - firstDateMillis) / MILLIS_PER_DAY) + 1; + + + +maxDateIndex = dateCount; + + + +dateStamp = new String[dateCount]; + + + +datePretty = new String[dateCount]; + + + + + + +todayDateStamp = year() + nf(month(), 2) + nf(day(), 2); + + + +// Another option to do this, but more code + + + +//Date today = new Date(); + + + +//String todayDateStamp = stampFormat.format(today); + + + + + + + +for (int i = 0; i < dateCount; i++) { + + + +Date date = new Date(firstDateMillis + MILLIS_PER_DAY*i); + + + +datePretty[i] = prettyFormat.format(date); + + + +dateStamp[i] = stampFormat.format(date); + + + +// If this value for 'date' is equal to today, then set the previous + + + +// day as the maximum viewable date, because it means the season is + + + +// still ongoing. The previous day is used because unless it is late + + + +// in the evening, the updated numbers for the day will be unavailable + + + +// or incomplete. + + + +if (dateStamp[i].equals(todayDateStamp)) { + + + +maxDateIndex = i-1; + + + +} + + + +} + + + +} catch (ParseException e) { + + + +die("Problem while setting up dates", e); + + + +} + + + +} + + +The primary result of this function is to set up minDateIndex and maxDateIndex, as well as to calculate all dates in the entire season in two formats (the dateStamp and datePretty arrays) so that they can be used elsewhere. + + +The code above is designed to be more general than the previously mentioned array that holds the number of days in each month. The original version of the project used the simpler method, since hand-tweaking was not a problem and a quick fix (February isn't part of the baseball season either, meaning that no leap year considerations have to be made). But if you were to adapt this project to another situation, such as the football season which spans from Fall to Winter (meaning that the months count 10, 11, 12, then 1) it was more prudent to show here a generic alternative that could be more easily adapted. + + +If running this code online, the firstDateStamp and lastDateStamp could even be pulled from an HTML parameter using the built-in param() method, which can read HTML tags for such parameters. This way, different years could be shown without needing to recompile the applet. + + +Drawing the dates + + +At the top of the screen we'll add a simple date selector. The selector will consist of a series of vertical lines, with the current date shown as a longer line, and the title of the date (taken from datePretty) shown beneath it. + + + +int dateSelectorX; + + + +int dateSelectorY = 30; + + + + + + +// Draw a series of lines for selecting the date + + + +void drawDateSelector() { + + + +dateSelectorX = (width - dateCount*2) / 2; + + + + + + +strokeWeight(1); + + + +for (int i = 0; i < dateCount; i++) { + + + +int x = dateSelectorX + i*2; + + + + + + +// If this is the currently selected date, draw it differently + + + +if (i == dateIndex) { + + + +stroke(0); + + + +line(x, 0, x, 13); + + + +textAlign(CENTER, TOP); + + + +text(datePretty[dateIndex], x, 15); + + + + + + +} else { + + + +// If this is a viewable date, make the line darker + + + +if ((i >= minDateIndex) && (i <= maxDateIndex)) { + + + +stroke(128); // Viewable date + + + +} else { + + + +stroke(204); // Not a viewable date + + + +} + + + +line(x, 0, x, 7); + + + +} + + + +} + + + +} + + +Load standings for the entire season + + +An update to the setupStandings() function downloads data for each day of the season (if it has not yet been downloaded), and the season array stores each day of standings for the season thus far. + + +StandingsList[] season; + + + + + +void setupStandings() { + + + +season = new StandingsList[maxDateIndex + 1]; + + + +for (int i = minDateIndex; i <= maxDateIndex; i++) { + + + +String[] lines = acquireStandings(dateStamp[i]); + + + +season[i] = new StandingsList(lines); + + + +} + + +} + + +Another version of the acquireStandings() method also breaks up a date stamp into its component parts so that it can be handled by the original acquireStandings method: + + +String[] acquireStandings(String stamp) { + + + +int year = int(stamp.substring(0, 4)); + + + +int month = int(stamp.substring(4, 6)); + + + +int day = int(stamp.substring(6, 8)); + + + +return acquireStandings(year, month, day); + + +} + + +Switching between dates + + +With all the data in place, selecting dates is a matter of determining where the mouse was clicked inside the date selector area. The mousePressed() and mouseDragged() will be combined to a single handleMouse() method that calculates whether a new date was chosen: + + +void setDate(int index) { + + + +dateIndex = index; + + + +standings = season[dateIndex]; + + +} + + + + + +void mousePressed() { + + + +handleMouse(); + + +} + + + + + + +void mouseDragged() { + + + +handleMouse(); + + +} + + + + + + +void handleMouse() { + + + +if (mouseY < dateSelectorY) { + + + +int date = (mouseX - dateSelectorX) / 2; + + + +setDate(constrain(date, minDateIndex, maxDateIndex)); + + + +} + + +} + + +And just for kicks, we add a keyPressed() method so that we can use the arrow keys to move back and forth in time: + + +void keyPressed() { + + + +if (key == CODED) { + + + +if (keyCode == LEFT) { + + + +int newDate = max(dateIndex - 1, minDateIndex); + + + +setDate(newDate); + + + + + + +} else if (keyCode == RIGHT) { + + + +int newDate = min(dateIndex + 1, maxDateIndex); + + + +setDate(newDate); + + + +} + + + +} + + +} + + +Checking our progress + + +The only visible progress can be seen in the date selector at the top of the screen: + + +Figure 08 + + +But by clicking and dragging across the date selector, the display will rapidly switch between the standings for each day. The update is too jerky, which makes it difficult to follow. As you might guess, we'll next bring back our Integrator friend to help smooth things out. + +Smoothing out the interaction (Refine) + +In what is perhaps becoming a common refrain, we'll next be adding the Integrator class to the sketch, which will help us animate the transition between days: + + + +http://benfry.com/book/salaryper/Integrator.java + + + +The only value that moves are the 30 values for the standings, so we'll add a setupRanking() function to initialize them and set a default position. Add a call to setupRanking() inside setup(), just after the other setupXxxxx() functions. + + +Integrator[] standingsPosition; + + + + + +void setupRanking() { + + + +standingsPosition = new Integrator[teamCount]; + + + +for (int i = 0; i < teamCodes.length; i++) { + + + +standingsPosition[i] = new Integrator(i); + + + +} + + +} + + +Inside draw(), we'll no longer use getRank() to determine the location for standingsY. + + +float standingsY = standings.getRank(i)*ROW_HEIGHT + HALF_ROW_HEIGHT; + + +Instead, it will be based on the current position of each Integrator (that are taking their sweet time to reach the current rankings) rather than the actual ranking value: + + +float standingsY = standingsPosition[i].value * ROW_HEIGHT + HALF_ROW_HEIGHT; + + +At the beginning of draw(), it's also necessary to update each standingsPosition. As a twist, we'll also keep track of whether any of the Integrators actually change inside their update() method (which returns true if the value actually changed by some amount). If no changes occur, then we'll use noLoop() shut off the animation loop to save CPU cycles: + + + +boolean updated = false; + + + +for (int i = 0; i < teamCount; i++) { + + + +if (standingsPosition[i].update()) { + + + +updated = true; + + + +} + + + +} + + + +if (!updated) { + + + +noLoop(); + + + +} + + +Of course, we will eventually need to turn the animation back on, when the user selects a new date. An updated setDate() method will handle targeting the new ranking values with each Integrator, and starting up the animation loop by calling loop(). + + +void setDate(int index) { + + + +dateIndex = index; + + + +standings = season[dateIndex]; + + + + + + +for (int i = 0; i < teamCount; i++) { + + + +standingsPosition[i].target(standings.getRank(i)); + + + +} + + + +// Re-enable the animation loop + + + +loop(); + + +} + + +Also regarding animation, it's important to also set a frame rate at which the sketch should be run, so that it behaves consistently on other machines. Adding frameRate(15) to setup() will ensure that transitions behave smoothly and the animation is consistent even on much faster computers. + +Deployment considerations (Acquire, Parse, Filter) + +As discussed in the second chapter, sketches that run online inside a web browser are not allowed access to the user's local file system (for security reasons). That means our current scheme of downloading files for each day and using the File object to check whether they've already been downloaded won't be sufficient. + + +As it turns out, the current implementation is also quite inefficient: at the end of the season you'll have hundreds of individual files on your disk for each day, but each of them occupy about 300 bytes apiece. + + +So instead, we return back to the early preprocessing steps. The solution for both situations is to instead run the pre-processing steps from a CGI script. The script can download the data once for each day, then join each of the files into a single one that can be downloaded by a web visitor. If the CGI script runs from the same server as the sketch, the sketch will be able to connect to it and download the data, since connecting back to its parent server is considered safe under Java's security model. + + +A Perl version of the script, essentially an adaptation of the acquireStandings() and parseWinLoss() methods follows. Creating a version for PHP or other web frameworks shouldn't be too much of a stretch. + + +#!/usr/bin/perl -w + + + + + +use Time::Local; + + + + + +# Send header to the web server to indicate we are awake, + + +# and that plain text data will be returned. + + +print "Content-type: text/plain\n\n"; + + + + + +# These values could be read from parameters to the CGI if so desired, i.e. + + +# http://benfry.com/salaryper/data.cgi?first=20070401&last=20070930&min=10 + + +# This would make the software more flexible to use it for multiple years. + + +$firstDateStamp = '20070401'; + + +$lastDateStamp = '20070930'; + + +$minDateIndex = 10; + + + + + +$dataFolder = 'individual'; + + +$comboFolder = 'combined'; + + +`mkdir -p $dataFolder`; + + +`mkdir -p $comboFolder`; + + + + + +$firstDateStamp =~ /(\d\d\d\d)(\d\d)(\d\d)/; + + +$year = $1; + + +$month = $2 - 1; # Months are 0-indexed in Perl + + +$day = $3; + + +$firstDate = timelocal(0, 0, 0, $day, $month, $year); + + + + + +$lastDateStamp =~ /(\d\d\d\d)(\d\d)(\d\d)/; + + +$year = $1; + + +$month = $2 - 1; # Months are 0-indexed in Perl + + +$day = $3; + + +$lastDate = timelocal(0, 0, 0, $day, $month, $year); + + + + + +$SECONDS_PER_DAY = 24 * 60 * 60; + + + + + +# Yesterday is the maximum possible date, + + +# because the scores from today will not yet be updated. + + +$yesterdayDate = time - $SECONDS_PER_DAY; + + + + + +# Don't bother grabbing data for the earlier part of the season + + +# because it will not be used (and the program is not expecting it) + + +$date = $firstDate + $minDateIndex*$SECONDS_PER_DAY; + + + + + +my @dateStamps = (); + + + + + +# If season is ongoing, only read data through yesterday. + + +$endDate = ($yesterdayDate < $lastDate) ? $yesterdayDate : $lastDate; + + +while ($date <= $endDate) { + + + +($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = + + + +localtime($date); + + + +$stamp = sprintf("%04d%02d%02d", $year + 1900, $mon+1, $mday); + + + +push @dateStamps, $stamp; + + + +#print "$date - " . localtime($date) . "\n"; + + + +$date += $SECONDS_PER_DAY; + + +} + + +$endDateStamp = $dateStamps[$#dateStamps]; + + + + + +$combinedFile = "$comboFolder/$endDateStamp.tsv"; + + +if (-f $combinedFile) { + + + +# Open the file and spew the contents back to the applet. + + + +open(INPUT, $combinedFile) || die $!; + + + +@contents = <INPUT>; + + + +print @contents; + + + +close(INPUT); + + + + + +} else { + + + +# Download any days not yet downloaded. + + + +foreach $stamp (@dateStamps) { + + + +$filename = "$dataFolder/$stamp.tsv"; + + + +if (!(-f $filename)) { + + + +downloadWinLoss($stamp); + + + +} + + + +} + + + +# Concatenate everything into a single file. + + + +open(OUTPUT, ">$combinedFile") || die $!; + + + +foreach $stamp (@dateStamps) { + + + +open(INPUT, "$dataFolder/$stamp.tsv") || die $!; + + + +@contents = <INPUT>; + + + +print OUTPUT @contents; + + + +close(INPUT); + + + + + + +# Also write the contents of this file to the applet. + + + +print @contents; + + + +} + + + +close(OUTPUT); + + +} + + + + + + + + +sub downloadWinLoss() { + + + +my $stamp = shift; + + + + + + + +open(OUTPUT, ">$dataFolder/$stamp.tsv") || die $!; + + + + + + +$stamp =~ /(\d\d\d\d)(\d\d)(\d\d)/; + + + +$day = sprintf("year_%04d/month_%02d/day_%02d/", $1, $2, $3); + + + + + + +$base = 'http://mlb.mlb.com/components/game/' . $day; + + + + + + +parseWinLoss($base . 'standings_rs_ale.js'); + + + +parseWinLoss($base . 'standings_rs_alw.js'); + + + +parseWinLoss($base . 'standings_rs_alc.js'); + + + + + + +parseWinLoss($base . 'standings_rs_nle.js'); + + + +parseWinLoss($base . 'standings_rs_nlw.js'); + + + +parseWinLoss($base . 'standings_rs_nlc.js'); + + + + + + +close(OUTPUT); + + +} + + + + + + + + +sub parseWinLoss() { + + + +$url = shift; + + + +# Download the contents of the .js file using "curl" + + + +@lines = `curl --silent $url`; + + + + + + +$teamCode = ''; + + + +$wins = 0; + + + +$losses = 0; + + + + + + +foreach $line (@lines) { + + + +if ($line =~ /\s+([\w\d]+):\s'(.*)',?/) { + + + +$attr = $1; + + + +$value = $2; + + + +if ($attr eq 'code') { + + + +$teamCode = $value; + + + +} elsif ($attr eq 'w') { + + + +$wins = $value; + + + +} elsif ($attr eq 'l') { + + + +$losses = $value; + + + +} + + + + + + +} elsif ($line =~ /^}/) { + + + +# This is the end of a group, print the values + + + +print OUTPUT "$teamCode\t$wins\t$losses\n"; + + + +} + + + +} + + +} + + +The script can be seen in action at: + + + +http://benfry.com/book/salaryper/mlb.cgi + + + +Or downloaded directly from: + + + +http://benfry.com/book/salaryper/mlb.cgi.txt + + + +If data has not been downloaded for the current day, it downloads the new information, then produces a file that concatenates all the days found so far. If this has already occurred once, the file itself is simply echoed back to the web server. + + +Because this also moves all the preprocessing code out of the sketch, the acquireStandings() and parseWinLoss() methods can be removed from the code, simplifying things greatly. The new version of setupStandings() that reads the data instead uses a URL to download the data, and then creates a new StandingsList for each set of 30 lines. The maxDateIndex is determined by the amount of data received from the CGI script, and it's important to keep the minDateIndex variable in your code in sync with the minDateIndex value used in the CGI, so that both pieces of software are expected the same day as the first. + + +The complete code follows. + + +import java.util.regex.*; + + + + + +int teamCount = 30; + + +String[] teamNames; + + +String[] teamCodes; + + +HashMap teamIndices; + + + + + + +static final int ROW_HEIGHT = 23; + + +static final float HALF_ROW_HEIGHT = ROW_HEIGHT / 2.0f; + + + + + +static final int SIDE_PADDING = 30; + + +static final int TOP_PADDING = 40; + + + + + +SalaryList salaries; + + +StandingsList standings; + + + + + + +StandingsList[] season; + + +Integrator[] standingsPosition; + + + + + +PImage[] logos; + + +float logoWidth; + + +float logoHeight; + + + + + +PFont font; + + + + + + + + + +// . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + + + + + + + + + +String firstDateStamp = "20070401"; + + +String lastDateStamp = "20070930"; + + +String todayDateStamp; + + + + + +static final long MILLIS_PER_DAY = 24 * 60 * 60 * 1000; + + + + + +// The number of days in the entire season. + + +int dateCount; + + +// The current date being shown. + + +int dateIndex; + + +// Don't show the first 10 days, they're too erratic. + + +int minDateIndex = 10; + + +// The last day of the season, or yesterday, if the season is ongoing. + + +// This is the maximum date that can be viewed. + + +int maxDateIndex; + + + + + +// This format makes "20070704" from the date July 4, 2007. + + +DateFormat stampFormat = new SimpleDateFormat("yyyyMMdd"); + + +// This format makes "4 July 2007" from the same. + + +DateFormat prettyFormat = new SimpleDateFormat("d MMMM yyyy"); + + + + + +// All dates for the season formatted with stampFormat. + + +String[] dateStamp; + + +// All dates in the season formatted with prettyFormat. + + +String[] datePretty; + + + + + +void setupDates() { + + + +try { + + + +Date firstDate = stampFormat.parse(firstDateStamp); + + + +long firstDateMillis = firstDate.getTime(); + + + +Date lastDate = stampFormat.parse(lastDateStamp); + + + +long lastDateMillis = lastDate.getTime(); + + + + + + +// Calculate number of days by dividing the total milliseconds + + + +// between the first and last dates by the number of milliseconds per day + + + +dateCount = (int) + + + +((lastDateMillis - firstDateMillis) / MILLIS_PER_DAY) + 1; + + + +maxDateIndex = dateCount; + + + +dateStamp = new String[dateCount]; + + + +datePretty = new String[dateCount]; + + + + + + +todayDateStamp = year() + nf(month(), 2) + nf(day(), 2); + + + +// Another option to do this, but more code + + + +//Date today = new Date(); + + + +//String todayDateStamp = stampFormat.format(today); + + + + + + + +for (int i = 0; i < dateCount; i++) { + + + +Date date = new Date(firstDateMillis + MILLIS_PER_DAY*i); + + + +datePretty[i] = prettyFormat.format(date); + + + +dateStamp[i] = stampFormat.format(date); + + + +// If this value for 'date' is equal to today, then set the previous + + + +// day as the maximum viewable date, because it means the season is + + + +// still ongoing. The previous day is used because unless it is late + + + +// in the evening, the updated numbers for the day will be unavailable + + + +// or incomplete. + + + +if (dateStamp[i].equals(todayDateStamp)) { + + + +maxDateIndex = i-1; + + + +} + + + +} + + + +} catch (ParseException e) { + + + +die("Problem while setting up dates", e); + + + +} + + +} + + + + + + + + +// . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + + + + + + + + +public void setup() { + + + +size(480, 750); + + + + + + + +setupTeams(); + + + +setupDates(); + + + +setupSalaries(); + + + +// Load the standings after the salaries, because salary + + + +// will be used as the tie-breaker when sorting. + + + +setupStandings(); + + + +setupRanking(); + + + +setupLogos(); + + + + + + + +font = createFont("Georgia", 12); + + + +textFont(font); + + + + + + +frameRate(15); + + + +// Use today as the current day + + + +setDate(maxDateIndex); + + +} + + + + + + + + + +void setupTeams() { + + + +String[] lines = loadStrings("teams.tsv"); + + + + + + + +teamCount = lines.length; + + + +teamCodes = new String[teamCount]; + + + +teamNames = new String[teamCount]; + + + +teamIndices = new HashMap(); + + + + + + + +for (int i = 0; i < teamCount; i++) { + + + +String[] pieces = split(lines[i], TAB); + + + +teamCodes[i] = pieces[0]; + + + +teamNames[i] = pieces[1]; + + + +teamIndices.put(teamCodes[i], new Integer(i)); + + + +} + + +} + + + + + + + + + + +int teamIndex(String teamCode) { + + + +Integer index = (Integer) teamIndices.get(teamCode); + + + +return index.intValue(); + + +} + + + + + + + + +void setupSalaries() { + + + +String[] lines = loadStrings("salaries.tsv"); + + + +salaries = new SalaryList(lines); + + +} + + + + + + + + +void setupStandings() { + + + +String[] lines = loadStrings("http://benfry.com/book/salaryper/mlb.cgi"); + + + +int dataCount = lines.length / teamCount; + + + +int expectedCount = (maxDateIndex - minDateIndex) + 1; + + + +if (dataCount < expectedCount) { + + + +println("Found " + dataCount + " entries in the data file, " + + + + +"but was expecting " + expectedCount + " entries."); + + + +maxDateIndex = minDateIndex + dataCount - 1; + + + +} + + + +season = new StandingsList[maxDateIndex + 1]; + + + +for (int i = 0; i < dataCount; i++) { + + + +String[] portion = subset(lines, i*teamCount, teamCount); + + + +season[i+minDateIndex] = new StandingsList(portion); + + + +} + + +} + + + + + + + + + +void setupRanking() { + + + +standingsPosition = new Integrator[teamCount]; + + + +for (int i = 0; i < teamCodes.length; i++) { + + + +standingsPosition[i] = new Integrator(i); + + + +} + + +} + + + + + + + + + +void setupLogos() { + + + +logos = new PImage[teamCount]; + + + +for (int i = 0; i < teamCount; i++) { + + + +logos[i] = loadImage("small/" + teamCodes[i] + ".gif"); + + + +} + + + +logoWidth = logos[0].width / 2.0f; + + + +logoHeight = logos[0].height / 2.0f; + + +} + + + + + + + + + + +public void draw() { + + + +background(255); + + + +smooth(); + + + + + + +drawDateSelector(); + + + + + + +translate(SIDE_PADDING, TOP_PADDING); + + + + + + + +boolean updated = false; + + + +for (int i = 0; i < teamCount; i++) { + + + +if (standingsPosition[i].update()) { + + + +updated = true; + + + +} + + + +} + + + +if (!updated) { + + + +noLoop(); + + + +} + + + + + + +for (int i = 0; i < teamCount; i++) { + + + +float standingsY = standingsPosition[i].value * ROW_HEIGHT + HALF_ROW_HEIGHT; + + + + + + +image(logos[i], 0, standingsY - logoHeight/2, logoWidth, logoHeight); + + + + + + + +textAlign(LEFT, CENTER); + + + +text(teamNames[i], 28, standingsY); + + + + + + +textAlign(RIGHT, CENTER); + + + +fill(128); + + + +text(standings.getTitle(i), 150, standingsY); + + + + + + +float weight = map(salaries.getValue(i), + + + +salaries.getMinValue(), salaries.getMaxValue(), + + + +0.25f, 6); + + + +strokeWeight(weight); + + + + + + + +float salaryY = salaries.getRank(i)*ROW_HEIGHT + HALF_ROW_HEIGHT; + + + +if (salaryY >= standingsY) { + + + +stroke(33, 85, 156); // Blue for positive (or equal) difference. + + + +} else { + + + +stroke(206, 0, 82); // Red for wasting money. + + + +} + + + + + + + +line(160, standingsY, 325, salaryY); + + + + + + +fill(128); + + + +textAlign(LEFT, CENTER); + + + +text(salaries.getTitle(i), 335, salaryY); + + + +} + + +} + + + + + + + + + +// . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + + + + + + + + +int dateSelectorX; + + +int dateSelectorY = 30; + + + + + +// Draw a series of lines for selecting the date + + +void drawDateSelector() { + + + +dateSelectorX = (width - dateCount*2) / 2; + + + + + + +strokeWeight(1); + + + +for (int i = 0; i < dateCount; i++) { + + + +int x = dateSelectorX + i*2; + + + + + + +// If this is the currently selected date, draw it differently + + + +if (i == dateIndex) { + + + +stroke(0); + + + +line(x, 0, x, 13); + + + +textAlign(CENTER, TOP); + + + +text(datePretty[dateIndex], x, 15); + + + + + + +} else { + + + +// If this is a viewable date, make the line darker + + + +if ((i >= minDateIndex) && (i <= maxDateIndex)) { + + + +stroke(128); // Viewable date + + + +} else { + + + +stroke(204); // Not a viewable date + + + +} + + + +line(x, 0, x, 7); + + + +} + + + +} + + +} + + + + + + + + +void setDate(int index) { + + + +dateIndex = index; + + + +standings = season[dateIndex]; + + + + + + +for (int i = 0; i < teamCount; i++) { + + + +standingsPosition[i].target(standings.getRank(i)); + + + +} + + + +// Re-enable the animation loop + + + +loop(); + + +} + + + + + + + + +void mousePressed() { + + + +handleMouse(); + + +} + + + + + + +void mouseDragged() { + + + +handleMouse(); + + +} + + + + + + +void handleMouse() { + + + +if (mouseY < dateSelectorY) { + + + +int date = (mouseX - dateSelectorX) / 2; + + + +setDate(constrain(date, minDateIndex, maxDateIndex)); + + + +} + + +} + + + + + + + + +void keyPressed() { + + + +if (key == CODED) { + + + +if (keyCode == LEFT) { + + + +int newDate = max(dateIndex - 1, minDateIndex); + + + +setDate(newDate); + + + + + + +} else if (keyCode == RIGHT) { + + + +int newDate = min(dateIndex + 1, maxDateIndex); + + + +setDate(newDate); + + + +} + + + +} + + +} + + + + + + + + +// . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + + + + + + + + +class SalaryList extends RankedList { + + + + + + + +SalaryList(String[] lines) { + + + +super(teamCount, false); + + + + + + + +for (int i = 0; i < teamCount; i++) { + + + +String pieces[] = split(lines[i], TAB); + + + + + + + +// First column is the team 2-3 digit team code. + + + +int index = teamIndex(pieces[0]); + + + + + + + +// Second column is the salary as a number. + + + +value[index] = parseInt(pieces[1]); + + + + + + + +// Make the title in the format $NN,NNN,NNN + + + +int salary = (int) value[index]; + + + +title[index] = "$" + nfc(salary); + + + +} + + + +update(); + + + +} + + +} + + + + + + + + + +// . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + + + + + + + + +class StandingsList extends RankedList { + + + + + + + +StandingsList(String[] lines) { + + + +super(teamCount, false); + + + + + + + +for (int i = 0; i < teamCount; i++) { + + + +String[] pieces = split(lines[i], TAB); + + + +int index = teamIndex(pieces[0]); + + + +int wins = parseInt(pieces[1]); + + + +int losses = parseInt(pieces[2]); + + + + + + + +value[index] = (float) wins / (float) (wins+losses); + + + +title[index] = wins + "\u2013" + losses; + + + +} + + + +update(); + + + +} + + + + + + +float compare(int a, int b) { + + + +// First compare based on the record of both teams + + + +float amt = super.compare(a, b); + + + +// If the record is not identical, return the difference + + + +if (amt != 0) return amt; + + + + + + +// If records are equal, use salary as tie-breaker. + + + +// In this case, a and b are switched, because a higher + + + +// salary is a negative thing, unlike the values above. + + + +return salaries.compare(a, b); + + + +} + + +} + +Next Steps + +Several directions this can be taken in from here, from adaptations of the current data set, to using other data sets or other types of correlations: + + +Expand the data to a longer series + + +Can also expand this to do parallel coordinates (show different variables as columns, not just win loss but other stats) + + +etc. etc. + + +I think this is something to return to once the other chapters are finished, so that it's clear what needs more coverage. + + + + #!/bin/sh cd work && ./processing && cd .. diff --git a/build/shared/revisions.txt b/build/shared/revisions.txt index 7f6d79e4d..3195bd2d6 100644 --- a/build/shared/revisions.txt +++ b/build/shared/revisions.txt @@ -28,10 +28,23 @@ Throwing out the baby with the bathwater. Capture(PApplet parent, int width, int height, int rate) though we'll be leaving it in because it was used so heavily. -+ Changing key command to instead use ctrl/cmd + alt + left/right, - identical to Firefox on Mac, PC and Linux. Also fixes a bug - that caused French keyboards to have problems since 0123. - http://dev.processing.org/bugs/show_bug.cgi?id=480 ++ The split() method has been altered to avoid inconsistency. In fact, + it has split itself into two methods. Har har har. + split(String s), has been removed, use splitTokens(String s) instead. + split(String s, char c) remains unchanged. + split(String s, String delim) is now splitTokens(String s, String delim) + The new model is that split() breaks a String into pieces given a single + delmiter, which can be a single character or many characters. For instance, + split("Something
in

HTML.", "
") + now produces the array "Something", "in", "", "HTML". + The split() method breaks the String whenever it sees the delimiter, + so a
next to another
produces an empty String. + On the other hand, splitTokens() takes a list of possible delimiters, + and will consume one or more of any of them. For instance: + splitTokens("really, messed ,up with ,,, commas,", ", "); + will produce the array "really", "messed", "up", "with", "commas". + Because any group of comma or space characters (and in any order) + will simply be consumed as a single break. + Blending modes have been finalized for 1.0, thanks to the help of Dave Bollinger, who contributed code to complete the set: @@ -41,8 +54,16 @@ Throwing out the baby with the bathwater. + At Casey's urging, the "Color Picker" is now called the "Color Selector". ++ The constant CENTER_RADIUS is now simply RADIUS. Both still work, + but CENTER_RADIUS should be avoided. + [ bug fixes ] ++ Changing the shortcut keys for switching tabs to instead use + ctrl/cmd + alt + left/right, identical to Firefox on Mac, PC and Linux. + Also fixes a bug that caused French keyboards to have problems since 0123. + http://dev.processing.org/bugs/show_bug.cgi?id=480 + + Properly refresh code when using an external editor (regression) http://dev.processing.org/bugs/show_bug.cgi?id=515 diff --git a/core/todo.txt b/core/todo.txt index 76933cafd..0562fe003 100644 --- a/core/todo.txt +++ b/core/todo.txt @@ -30,6 +30,7 @@ _ remove CENTER_RADIUS from any p5 code (i.e. examples) X split() inconsistency (emailed casey, will discuss later) X make split(String, String) behave like String.split(String) X and make current split(String) into splitTokens(String) +X that means split(String) no longer exists _ add splitTokens() documentation _ document new version of split() and regexp _ should we mention String.split? @@ -63,13 +64,13 @@ X or textAlign(LEFT, MIDDLE); -> this one seems best X add reference for new param, and update keywords.txt X given to andy -0125pX (will be 3) +0125p3 (in progress) X PImage.save() method is not working with get() X http://dev.processing.org/bugs/show_bug.cgi?id=558 X NullPointerException in Create Font with "All Characters" enabled X http://dev.processing.org/bugs/show_bug.cgi?id=564 X added min() and max() for float and int arrays -_ need to update reference +X need to update reference X moved around min/max functions X opengl image memory leaking X when creating a new PImage on every frame, slurps a ton of memory @@ -77,6 +78,9 @@ X workaround is to write the code properly, but suggests something bad X http://dev.processing.org/bugs/show_bug.cgi?id=150 X registerSize() was registering as pre() instead X http://dev.processing.org/bugs/show_bug.cgi?id=582 +_ set() doesn't bounds check +_ this shouldn't actually be the case +_ http://dev.processing.org/bugs/show_bug.cgi?id=522 _ PGraphics problem with fillColor _ http://dev.processing.org/bugs/show_bug.cgi?id=468 @@ -172,9 +176,6 @@ _ textAlign(CENTER) with P3D and OPENGL produces messy result _ probably rounding error with the images _ http://dev.processing.org/bugs/show_bug.cgi?id=475 -_ set() doesn't bounds check -_ http://dev.processing.org/bugs/show_bug.cgi?id=522 - _ add to open() reference problems with mac _ need to use the 'open' command on osx _ or need to do this by hand diff --git a/todo.txt b/todo.txt index 81f40a47e..e1c2ea00a 100644 --- a/todo.txt +++ b/todo.txt @@ -19,18 +19,20 @@ X http://dev.processing.org/bugs/show_bug.cgi?id=515 X preprocessor cannot handle L or l added to 'long' values X http://dev.processing.org/bugs/show_bug.cgi?id=492 X change constructors for Capture, also framerate to frameRate -_ need to update reference +X need to update reference +_ need to update example to use proper ordering X re-architect svg to properly inherit fill/stroke/etc from parents X object can specify fill/stroke for everyone below X need to discern between having a fill specified and one not being present -0125p3 +0125p3 (in progress) _ moviemaker X moved around constructors (to be on par with other video lib stuff) X cleaned up constant names (i.e. MSVideo -> MS_VIDEO) X added constant for h264 encoding _ add documentation +_ add() or addFrame()? X find/replace - replace should do auto find next(?) X or have a replace & find button X placing "replace" next to "find" ... (hitting "replace all" by accident) @@ -41,6 +43,16 @@ X currently it's rebuilding whenever "save" called too X http://dev.processing.org/bugs/show_bug.cgi?id=357 X ignore ._ files when reading jar and zip X Ignoring /Users/fry/coconut/processing/build/macosx/work/libraries/opengl/library/._jogl-natives-linux-i586.jar (error in opening zip file) +X look into deleting from p5 bugs db +X stop button kills the sketch window when running externally +X in Capture, if user cancels prompt, throws a '-128,userCanceledErr' +X in which case, need to return null (or ""?) for the prompt +X which will also just give you the last camera +X should it be new Camera(PROMPT); +o when passing in 'null' as the capture, dialog pops up fine +o but the applet craps out after a few seconds (pinwheel spin) +X can't confirm this one + new bugs _ huge jar files from 0124 export @@ -51,15 +63,19 @@ _ http://dev.processing.org/bugs/show_bug.cgi?id=562 0126 _ do the big move to multiple sketches open + +0127 +_ dynamic tools menu (requires moving files around) +_ this means can integrate the autoformat stuff _ finish up debian package support (see the processing.mess folder) + _ xml element needs to be readable from other charsets _ same with the other methods like loadStrings() _ could also be a way to handle gzip too? _ tho charset + gzip would be a problem _ hint(ENABLE_AUTO_GUNZIP) or rather hint(DISABLE_AUTO_GUNZIP) -_ look into deleting from p5 bugs db _ ? doesn't work with find in reference _ actually came up as 'null' @@ -168,15 +184,6 @@ _ http://dev.processing.org/bugs/show_bug.cgi?id=396 _ can draw() not be run on awt event thread? _ look into opengl stuff for dealing with this -official instructions for removing winvdig -from http://www.videoscript.com/forum/viewtopic.php?t=21 -First try to Uninstall WinVDIG from the Control Panel. Add/Remove Software. -If this does not work, or you wish to manually uninstall WinVDIG, try the following steps: -Remove VsVDIG.qtx (located in System32\QuickTime). -Remove VsDump.ax (located in System32). -Delete Program Files\WinVDIG -Restart you system - _ don't reload sketch on "save as" _ this can result in loss of data _ http://dev.processing.org/bugs/show_bug.cgi?id=433 @@ -246,9 +253,6 @@ _ or also add a method for getting the vectors? _ when running externally, set window frame title to the sketch name _ is this only a problem on macosx? -_ when drawing large video, the two triangles for the rect are out of sync -_ only shows up in P3D - processing wish list _ opening multiple versions of p5 at a time @@ -619,8 +623,6 @@ _ unchecking 'use external editor' sketch should not set modified _ dangerous if a version that hasn't been re-loaded has possibility _ to overwrite. i.e. make a change and save in external editor, _ don't actually -_ stop button won't kill a video sketch (bug 150 example does this) -_ although ESC seems to work? (not sure, didn't test) _ run/stop button highlight is almost completely broken _ http://dev.processing.org/bugs/show_bug.cgi?id=396 _ when running with external editor, hide the editor text area @@ -1032,6 +1034,7 @@ _ http://dev.processing.org/bugs/show_bug.cgi?id=496 _ figure out what's up with the qt error handler _ is this what's preventing the errors from being caught properly? _ shutting these off for 0116 so hopefully the messages improve +_ (could this be a mac issue with errors not making it through?) _ need to prevent multiple QTSession open or close _ static method shared across the lib, or some such _ reading movie is really really slow (2-3 fps) @@ -1041,28 +1044,27 @@ _ Movie needs the crop() functions ala Capture _ tearing and incomplete updates on capture? _ putting read() inside draw() seems to eliminate this? _ http://dev.processing.org/bugs/show_bug.cgi?id=114 -_ pause and framerate aren't working +_ when drawing large video, the two triangles for the rect are out of sync +_ only shows up in P3D +_ pause and frameRate aren't working _ framerate does set the frequency which movieEvent will be called, _ but it is not setting the "available" field corrrectly. +_ in fact, speed() should be used to set the rate, not frameRate _ sketch .zip file in casey's email message _ http://dev.processing.org/bugs/show_bug.cgi?id=370 _ wrong device name for video capture will cause a crash -_ when passing in 'null' as the capture, dialog pops up fine -_ but the applet craps out after a few seconds (pinwheel spin) _ couldn't get req'd component also happens when the capture isn't ready _ may also mean that no camera is plugged in _ also, don't mention winvdig on the mac -_ if user cancels prompt, throws a '-128,userCanceledErr' -_ in which case, need to return null (or ""?) for the prompt -_ which will also just give you the last camera -_ should it be new Camera(PROMPT); _ audio stops working after two seconds _ http://dev.processing.org/bugs/show_bug.cgi?id=277 -_ or audio won't stop even after hitting stop _ include a separate video class to handle just playback _ video playback can be much faster if not messing with pixels _ could instead use texsubimage in opengl, etc _ only supports tint() (to set alpha or color) and drawing? just drawing? +_ stop button won't kill a video sketch (bug 150 example does this) +X although ESC seems to work? (not sure, didn't test) +_ or audio won't stop even after hitting stop LIBRARIES / Serial diff --git a/video/src/processing/video/Capture.java b/video/src/processing/video/Capture.java index 2afc76c0f..0766a88a4 100755 --- a/video/src/processing/video/Capture.java +++ b/video/src/processing/video/Capture.java @@ -3,7 +3,7 @@ /* Part of the Processing project - http://processing.org - Copyright (c) 2004-06 Ben Fry and Casey Reas + Copyright (c) 2004-07 Ben Fry and Casey Reas The previous version of this code was developed by Hernando Barragan This library is free software; you can redistribute it and/or @@ -468,18 +468,24 @@ public class Capture extends PImage implements Runnable { // http://dev.processing.org/bugs/show_bug.cgi?id=366 channel.setBounds(qdrect); - // open the settings dialog + // Open the settings dialog (throws an Exception if canceled) channel.settingsDialog(); - // start the preview again - capture.startPreview(); } catch (StdQTException qte) { int errorCode = qte.errorCode(); - if (errorCode != Errors.userCanceledErr) { + if (errorCode == Errors.userCanceledErr) { + // User only canceled the settings dialog, continue as we were + } else { qte.printStackTrace(); - throw new RuntimeException("error inside Capture.settings()"); + throw new RuntimeException("Error inside Capture.settings()"); } } + try { + // Start the preview again (unreachable if newly thrown exception) + capture.startPreview(); + } catch (StdQTException qte) { + qte.printStackTrace(); + } } diff --git a/video/src/processing/video/Movie.java b/video/src/processing/video/Movie.java index 86dd6b4e1..917db04b0 100644 --- a/video/src/processing/video/Movie.java +++ b/video/src/processing/video/Movie.java @@ -1,10 +1,9 @@ /* -*- mode: jde; c-basic-offset: 2; indent-tabs-mode: nil -*- */ /* - PMovie - reading from video files Part of the Processing project - http://processing.org - Copyright (c) 2004-06 Ben Fry + Copyright (c) 2004-07 Ben Fry and Casey Reas The previous version of this code was developed by Hernando Barragan This library is free software; you can redistribute it and/or diff --git a/video/src/processing/video/MovieMaker.java b/video/src/processing/video/MovieMaker.java index ebc94a264..09dd1e2fb 100644 --- a/video/src/processing/video/MovieMaker.java +++ b/video/src/processing/video/MovieMaker.java @@ -52,6 +52,9 @@ import processing.core.*; * Library to create a QuickTime movie from a Processing pixel array. * Written by Daniel Shiffman. * Thanks to Dan O'Sullivan and Shawn Van Every. + *

+ * Please note that some constructors and variable names were altered + * slightly when the library was added to the Processing distribution. *
  * // Declare MovieMaker object
  * MovieMaker mm;
@@ -62,16 +65,17 @@ import processing.core.*;
  *   // Create MovieMaker object with size, filename,
  *   // compression codec and quality, framerate
  *   mm = new MovieMaker(this, width, height, "drawing.mov",
- *                       MovieMaker.H263, MovieMaker.HIGH,30);
+ *                       MovieMaker.H263, MovieMaker.HIGH, 30);
  *   background(160, 32, 32);
  * }
  *
  * void draw() {
- *   stroke(7,146,168);
+ *   stroke(7, 146, 168);
  *   strokeWeight(4);
+ *
  *   // Draw if mouse is pressed
  *   if (mousePressed) {
- *     line(pmouseX,pmouseY,mouseX,mouseY);
+ *     line(pmouseX, pmouseY, mouseX, mouseY);
  *   }
  *
  *   // Add window's pixels to movie