Today in History — Sorting Paragraphs with Leading Dates

At the site I’m at today, a comment was made about sorting a group of paragraphs from the syndicated Today in History column that is frequently used in newspapers. The way that column is sent across the wire, the lead paragraph has one element form some point in history. But the site I’m at, they want to combine that particular paragraph with the rest of items. But they need to sort them. The problem with them is they have some dates which are B.C. dates and all B.C. dates actually appear before all non B.C. dates. For if the we were to sort these dates in date order they should appear this way:
  • 319 B.C.: Something happened.
  • 57: Something happened in the middle of those dates.
  • 1976: Something else happened.
  • 2001: The last important thing happened.
If these strings are placed in a normal ECMA Script, JavaScript, or ExtendScript array, and sorted with the standard .sort() method, they would get sorted as string and alphabetically they would sort into this order:
  • 1976: Something else happened.
  • 2001: The last important thing happened.
  • 319 B.C.: Something happened.
  • 57: Something happened in the middle of those dates.
And, while that is sorted alphabetically, it isn't sorted numerically, let alone sorted by date.
If we could grab the first word, or first number, we could then sort it numerically by converting the first word to a number and sorting numerically. That is actually quite easy. But if we did that, then this list would be sorted in this order:
    • 57: Something happened in the middle of those dates.
    • 319 B.C.: Something happened.
    • 1976: Something else happened.
  • 2001: The last important thing happened.
  • Here again, this isn't quite right. The 319 is a B.C. date. Thus, it should appear first, not second.

    To solve the problem, I created a user sort function to change the way the .sort() method works. The function is below...

    var unsorted = String ( '1964: something\r350 B.C.: more text\r1756: text.\r319 bc: stuff\r1500 no colon').split('\r') ;
    var sortedWrong = String ( '1964: something\r350 B.C.: more text\r1756: text.\r319 bc: stuff\r1500 no colon').split('\r').sort() ;
    var sortedCorrectly = String ( '1964: something\r350 B.C.: more text\r1756: text.\r319 bc: stuff\r1500 no colon').split('\r').sort(sortStringsWithLeadingDate) ;

    function sortStringsWithLeadingDate (arg1,arg2) {
    //-- S O R T S T R I N G S W I T H L E A D I N G D A T E
    //-- Generic: Yes. ECMA Script, JavaScript, ExtendScript
    //-- Purpose: This is a helper function to sort the elements of an aray
    //-- such those created from an array of paragraphs from stories such
    //-- as 'Today in History' -- a column frequently run in newspapers.
    //-- That text has dates that are followed by a colon, but sometimes
    //-- those dates have B.C. BC dates should sort before the non
    //-- BC dates. This custom function helps do that.
    //-- Returns: The standard return for a user sort function. 0 if the
    //-- first argument is the same as the second function. A negative
    //-- number if the 2nd number is greater (or sorts after) the 1st
    //-- number. A positive number if the opposite is true.
    //-- Calls: an internal function: determineDate which attempts to
    //-- return a Number for the date. BC dates are returned as a
    //-- negative number, positive dates as positive dates. And when no
    //-- date can be determine, the original string will be passed back.
    //-- Standard use: Assume that you have an array of strings that looks
    //-- similar to the following:
    //-- var dts = [1964: something happened\r','350 B.C.: something happened a long time ago\r','1756: Something fundamental happened.\r']
    //-- dts.sort(sortStringsWithLeadingDate)
    //-- // now the dts array will be sorted by date.
    //-- Written by Jon S. Winters of electronic publishing support
    //-- eps@electronicpublishingsupport.com
    //-- Written: 2009.07.02 in Newark, New Jersey
    //-- Use the internal function to attempt to get a number for the
    //-- lead date of each argument passed to the this user sort function
    var arg1Date = determineDate ( arg1 ) ;
    var arg2Date = determineDate ( arg2 ) ;
    //-- If those two things did return dates, then the standard method
    //-- of sorting has the only needs to return the difference between
    //-- the second number and first number.
    var subResults = arg1Date - arg2Date ;
    if ( ! isNaN ( subResults ) ) {
    //-- if that value of subtracting is Not a Number we can't use
    //-- the results.
    return subResults ;
    //-- if here then the results wasn't a number, so try an alphabetical
    //-- comparison.
    if ( arg1Date == arg2Date ) {
    return 0 ;
    else if ( arg1Date <>
    return -1 ;
    return 1 ;
    //-- end of main function
    function determineDate ( arg ) {
    //-- D E T E R M I N E D A T E
    //-- Generic: in concept, but very specific to the text to examine.
    try {
    //-- Using Regular Expressions, find all the text upto the first :
    //-- If that isn't found the result can generate an error.
    var lead = arg.match( RegExp ( '^(.+?):' ))[1] ;
    //-- From that lead text, find the date of it which is a continue
    //-- string of numbers
    var year = lead.match( RegExp ( '\\d+' ) )[0] ;
    //-- Now determine if that lead text also has a 'b' and a 'c' in
    //-- that order. If so, return a negative value, if not, return
    //-- a numeric version.
    if ( new RegExp ( 'b.{1,3}c', 'i' ).test( lead ) ) {
    return -year ;
    return Number ( year ) ;
    //-- in case any of that caused an error, return the the original value
    catch ( err ) {
    return arg ;
    }//-- end of internal function


    Why not just use app.activeDocument

    The need for document references.
    In recent days I've posted two different generic ExtendScript functions for getting a document reference from an object.
    Why can we just use:
    var docRef = app.activeDocument ;
    Simple: It doesn't always work. .activeDocument is the application property for the active document -- the document the user is using. However, when using Adobe InDesign Server, there is never an active document and thus it fails. So, without anything else going on, app.activeDocument doesn't always work, and if you can't count on it even part of the time, you should use it.
    Well, if I can't count on it all the time I won't use it. Many of the ExtendScript scripts I create for clients are clients that I have never met, sometimes never even spoken too on the telephone. I need the scripts to work 100% of the time -- don't you?
    Well then, why not use:
    app.documents[0] ;
    That gives you the front document, which should be the one you are using. Yes, it generally does. But for that to work, you need to have the document open in a visible window. And one of the the options for opening a file is to open it without displaying it. And if you don't display it, it isn't in front. And if it isn't in front, it won't be document[0]. Thus, it too doesn't work 100% of the time. Do you want your automobile to only be drivable some of the days? Perhaps that is a bad analogy -- you might actually like it if you couldn't drive to work some days.
    There is another problem with both of these functions, lets assume you have a group of Adobe InDesign documents open. Lets assume you are using a findObject search function. Lets assume you locate an object on one of your open documents. Which document is it? It likely isn't the .activeDocument or .documents[0], so you need a function like the recently posted generic functions to point you to the particular document.
    Another case where these two application properties won't work; Lets assume you are constructing a new document from an old document -- for example I have a script I wrote for a client that replicates an ad stack from a converted QuarkXPress document (I think the client produced it with BrainWorks) to an Adobe InDesign document created from their current template. In this case we will always have two documents open and will need to have good static references to both of them. The documents don't have to be switching between active and not (just because you can't see a document doesn't mean that you can't be manipulating it) but even if both are visible, only one will ever be the .activeDocument. You can manipulate things in a document that isn't the active document or the first document.

    Good document references allow your scripts to work reliably. And with automation, reliability is more important than raw speed.


    Read Tab Delimited Text File

    var aFile = File.openDialog ( 'Select a Tab delimited file to parse:', '*.txt', false ) ;
    var fileData = readTabDelimitedFile ( aFile ) ;
    for ( var dIndex = 0 ; dIndex < fileData.length ; dIndex++ ) { //-- Do what you want with the data on a line by line basis. //-- This will write it to the JavaScript Concole with //-- and ugly '' indicator to show you where the
    //-- tabs were in the original file.
    $.writeln( fileData[dIndex].join ('' ) ) ;
    function readTabDelimitedFile ( fPath ) {
    //-- R E A D T A B D E L I M I T E D F I L E
    //-- Generic: Yes for all versions of ExtendScript with Adobe InCopy and
    //-- Adobe InDesign. Does not work with browser based JavaScript as
    //-- there is no File object. The File Object is one of the things
    //-- that makes ExtendScript not the same as JavaScript.
    //-- Purpose: To read a tab delimited file at the passed 'fPath' and
    //-- return an array of arrays. The main array will contain a sub
    //-- array for each tab delimiated value from each line of the file.
    //-- The file can be ASCII or unicode encoded.
    //-- Note, if the file has blank lines or lines without tabs, those
    //-- lines of the file will be ignored. This allows you to have
    //-- commented and empty lines in the file.
    //-- Parameters: fPath a full path to the file.
    //-- Returns: An array of arrays if a tab delimited file is successfully
    //-- read by the function. Returns an empty array if there were
    //-- problems. Because of this, you can successfully loop through
    //-- the array elements if there were issues reading the file.
    //-- Calls: nothing.
    //-- Sample Use:
    //-- var aFile = File.openDlg ( 'Select a Tab delimited file to parse:', '*.txt', false ) ;
    //-- var fileData = readTabDelimitedFile ( aFile ) ;
    //-- for ( var dIndex = 0 ; dIndex < fileData.length ; dIndex++ ) {
    //-- var activeLineArray = fileData[dIndex] ;
    //-- //-- Do what you want with that subarray
    //-- }
    //-- Written by Jon S. Winters of electronic publishing support from
    //-- scratch on 29 June 2009.
    //-- eps@electronicpublishingsupport.com

    //-- Setup the result of the function. When errors occur the function
    //-- should return an empty array.
    var returnArray = new Array ( ) ;

    //-- Verify that the file exists
    var fileObject = File ( fPath ) ;
    if ( ! fileObject.exists ) {
    return returnArray ; // an empty array because the file doesn't exist.
    //-- Create a regular expression for a tab.
    var tabExpression = new RegExp ( '\\t' ) ;

    //-- Read the file.
    try {
    //-- The file has to be open.
    fileObject.open ('r') ; //-- Open for reading.

    //-- repeat until eof (End Of File) or an error
    while ( ! fileObject.eof ) {
    //-- Read one line, and only one line.
    var currentLine = fileObject.readln () ;
    //-- verify that the line contains at least one tab
    //-- The .test() is my favorite way of using regular
    //-- expressions as it returns true or false if
    //-- the string contains the patter. The 'confusing'
    //-- thing about .test is that in the code it almost
    //-- reads backwards. Because you think you want to
    //-- know if the string has the pattern, but with
    //-- .test() you ask the regular expression if the
    //-- string will create a match.
    if ( tabExpression.test( currentLine ) ) {
    //-- Break the line into tab delimited parts and put that
    //-- array into the end of the array to return.
    //-- This will remove the tabs from the string and only
    //-- include the text between the tabs.
    returnArray.push(currentLine.split ('\t')) ;
    //-- if we didn't error, we need to close the file
    fileObject.close() ;
    //-- If there was an error reading the file,
    //-- then return the empty array.
    catch (errMain) {
    try {
    //-- an error was generated, try to close the file one more time
    fileObject.close() ;
    //-- if the close generates an error skip it.
    catch (errInner ) { /* nothing here */ }
    return returnArray ;

    Document, Page, and Spread References of Frame

    function docPageSpreadOfFrame (theFrame) {
    //-- D O C P A G E S P R E A D O F F R A M E
    //-- Generic: Yes, for Adobe InDesign. Tested with CS3 but should work
    //-- will CS2 through CS4
    //-- Purpose: To return a reference to the document, page, and spread
    //-- for the passed frame.
    //-- The need for referencing a page and a spread is because an item
    //-- on the pasteboard won't be on a page. If you are creating new
    //-- ojbects nearby, you need to know where the object is.
    //-- Returns: An object of 4 properties:
    //-- objectDoc: The Document Object for the frame
    //-- objectPageNum: The page number from the front of the document
    //-- objectPageRef: A referece to that page
    //- objectSpreadRef: A reference to the spread of the object.
    //-- Calls: Nothing.
    //-- Written: 2008.08.28 by Jon S. Winters
    //-- Edited: 2009.01.21 by Jon S. Winters for version 2n to reutrn object.
    //-- © 2009 electronic publishing support. All rights reserved.
    //-- How it works:
    //-- Assume we have an item on a page. If instead it is on a spread, then
    //-- the normal method of finding the document (looking at the paretnt)
    //-- will instead find the application. If so, then back down.
    //-- Note, we will assume the page the item goes on is the right hand page
    //-- of the spread. This is only if the item doesn't appear on a page
    //-- Version 2.05 put in controls so that it will return if a frame
    //-- was not passed.

    if ( '|Group|TextFrame|GraphicLine|Oval|Polygon|Rectangle|'.indexOf ('|' + theFrame.reflect.name + '|' , 0 ) < 0 ) {
    alert ( "The function 'docAndPageFromFrame' was not passed a frame." ) ;
    return null ;
    //-- Verify that we do not have an anchored item. 2.05
    if (theFrame.parent.reflect.name == 'Character' ) {
    theFrame = theFrame.parent.parentStory.textContainers[0] ;
    //-- Version 2.05 method works when theFrame is part of a group or
    //-- even if it is burried deeper.
    var objRef = theFrame
    var objRefParent = objRef.parent ;
    while ( objRefParent.reflect.name != 'Spread' ) {
    objRef = objRefParent ;
    objRefParent = objRef.parent ;
    //-- At this point, the objRef is still unknown, but we know the spread details

    //-- Keep the spreadReference and build the doc reference.
    var spreadRef = objRefParent ;
    //-- Get the doc from the frame using the normal method.
    var theDocOfTheFrame = spreadRef.parent ;

    //-- Now check to see if the obeject before the spread is a page
    if (objRef.reflect.name == 'Page' ) {
    //--on a page
    var pageRef = objRef ;
    var thePageNumOfTheFrame = pageRef.documentOffset + 1 ;
    else {
    //-- likely a group or something else
    //-- The below 'Page' is fake, but it works in most cases
    var pageRef = spreadRef ;
    var lastPageOfSpread = spreadRef.pages.item( (spreadRef.pages).length - 1 ) ;
    var thePageNumOfTheFrame = lastPageOfSpread.documentOffset + 1
    //-- Now setup a return object
    return {objectDoc:theDocOfTheFrame , objectPageNum:thePageNumOfTheFrame , objectPageRef:pageRef , objectSpreadRef:spreadRef }
    } //-- End of Function