ExtendScript Support: 6/28/09

2009-07-02

Today in History — Sorting Paragraphs with Leading Dates

At the site I’m at today, a comment was made about sorting a group of paragraphs from the syndicated Today in History column that is frequently used in newspapers. The way that column is sent across the wire, the lead paragraph has one element form some point in history. But the site I’m at, they want to combine that particular paragraph with the rest of items. But they need to sort them. The problem with them is they have some dates which are B.C. dates and all B.C. dates actually appear before all non B.C. dates. For if the we were to sort these dates in date order they should appear this way:

319 B.C.: Something happened.
57: Something happened in the middle of those dates.
1976: Something else happened.
2001: The last important thing happened.

If these strings are placed in a normal ECMA Script, JavaScript, or ExtendScript array, and sorted with the standard .sort() method, they would get sorted as string and alphabetically they would sort into this order:

1976: Something else happened.
2001: The last important thing happened.
319 B.C.: Something happened.
57: Something happened in the middle of those dates.

And, while that is sorted alphabetically, it isn't sorted numerically, let alone sorted by date.

If we could grab the first word, or first number, we could then sort it numerically by converting the first word to a number and sorting numerically. That is actually quite easy. But if we did that, then this list would be sorted in this order:

57: Something happened in the middle of those dates.
319 B.C.: Something happened.
1976: Something else happened.

2001: The last important thing happened.

Here again, this isn't quite right. The 319 is a B.C. date. Thus, it should appear first, not second.

To solve the problem, I created a user sort function to change the way the .sort() method works. The function is below...

var unsorted = String ( '1964: something\r350 B.C.: more text\r1756: text.\r319 bc: stuff\r1500 no colon').split('\r') ;

var sortedWrong = String ( '1964: something\r350 B.C.: more text\r1756: text.\r319 bc: stuff\r1500 no colon').split('\r').sort() ;

var sortedCorrectly = String ( '1964: something\r350 B.C.: more text\r1756: text.\r319 bc: stuff\r1500 no colon').split('\r').sort(sortStringsWithLeadingDate) ;

function sortStringsWithLeadingDate (arg1,arg2) {

//-------------------------------------------------------------------------

//-- S O R T S T R I N G S W I T H L E A D I N G D A T E

//-------------------------------------------------------------------------

//-- Generic: Yes. ECMA Script, JavaScript, ExtendScript

//-------------------------------------------------------------------------

//-- Purpose: This is a helper function to sort the elements of an aray

//-- such those created from an array of paragraphs from stories such

//-- as 'Today in History' -- a column frequently run in newspapers.

//-- That text has dates that are followed by a colon, but sometimes

//-- those dates have B.C. BC dates should sort before the non

//-- BC dates. This custom function helps do that.

//-------------------------------------------------------------------------

//-- Returns: The standard return for a user sort function. 0 if the

//-- first argument is the same as the second function. A negative

//-- number if the 2nd number is greater (or sorts after) the 1st

//-- number. A positive number if the opposite is true.

//-------------------------------------------------------------------------

//-- Calls: an internal function: determineDate which attempts to

//-- return a Number for the date. BC dates are returned as a

//-- negative number, positive dates as positive dates. And when no

//-- date can be determine, the original string will be passed back.

//-------------------------------------------------------------------------

//-- Standard use: Assume that you have an array of strings that looks

//-- similar to the following:

//-- var dts = [1964: something happened\r','350 B.C.: something happened a long time ago\r','1756: Something fundamental happened.\r']

//-- dts.sort(sortStringsWithLeadingDate)

//-- // now the dts array will be sorted by date.

//-------------------------------------------------------------------------

//-- Written by Jon S. Winters of electronic publishing support

//-- eps@electronicpublishingsupport.com

//-- Written: 2009.07.02 in Newark, New Jersey

//-------------------------------------------------------------------------

//-- Use the internal function to attempt to get a number for the

//-- lead date of each argument passed to the this user sort function

var arg1Date = determineDate ( arg1 ) ;

var arg2Date = determineDate ( arg2 ) ;

//-- If those two things did return dates, then the standard method

//-- of sorting has the only needs to return the difference between

//-- the second number and first number.

var subResults = arg1Date - arg2Date ;

if ( ! isNaN ( subResults ) ) {

//-- if that value of subtracting is Not a Number we can't use

//-- the results.

return subResults ;

}

//-- if here then the results wasn't a number, so try an alphabetical

//-- comparison.

if ( arg1Date == arg2Date ) {

return 0 ;

}

else if ( arg1Date <>

return -1 ;

}

return 1 ;

//-- end of main function

function determineDate ( arg ) {

//---------------------------------------------------------------------

//-- D E T E R M I N E D A T E

//---------------------------------------------------------------------

//-- Generic: in concept, but very specific to the text to examine.

//---------------------------------------------------------------------

try {

//-- Using Regular Expressions, find all the text upto the first :

//-- If that isn't found the result can generate an error.

var lead = arg.match( RegExp ( '^(.+?):' ))[1] ;

//-- From that lead text, find the date of it which is a continue

//-- string of numbers

var year = lead.match( RegExp ( '\\d+' ) )[0] ;

//-- Now determine if that lead text also has a 'b' and a 'c' in

//-- that order. If so, return a negative value, if not, return

//-- a numeric version.

if ( new RegExp ( 'b.{1,3}c', 'i' ).test( lead ) ) {

return -year ;

}

return Number ( year ) ;

}

//-- in case any of that caused an error, return the the original value

catch ( err ) {

return arg ;

}

}//-- end of internal function

}

2009-06-30

Why not just use app.activeDocument

The need for document references.
In recent days I've posted two different generic ExtendScript functions for getting a document reference from an object.
Why can we just use:
var docRef = app.activeDocument ;
Simple: It doesn't always work. .activeDocument is the application property for the active document -- the document the user is using. However, when using Adobe InDesign Server, there is never an active document and thus it fails. So, without anything else going on, app.activeDocument doesn't always work, and if you can't count on it even part of the time, you should use it.
Well, if I can't count on it all the time I won't use it. Many of the ExtendScript scripts I create for clients are clients that I have never met, sometimes never even spoken too on the telephone. I need the scripts to work 100% of the time -- don't you?
Well then, why not use:
app.documents[0] ;
That gives you the front document, which should be the one you are using. Yes, it generally does. But for that to work, you need to have the document open in a visible window. And one of the the options for opening a file is to open it without displaying it. And if you don't display it, it isn't in front. And if it isn't in front, it won't be document[0]. Thus, it too doesn't work 100% of the time. Do you want your automobile to only be drivable some of the days? Perhaps that is a bad analogy -- you might actually like it if you couldn't drive to work some days.
There is another problem with both of these functions, lets assume you have a group of Adobe InDesign documents open. Lets assume you are using a findObject search function. Lets assume you locate an object on one of your open documents. Which document is it? It likely isn't the .activeDocument or .documents[0], so you need a function like the recently posted generic functions to point you to the particular document.
Another case where these two application properties won't work; Lets assume you are constructing a new document from an old document -- for example I have a script I wrote for a client that replicates an ad stack from a converted QuarkXPress document (I think the client produced it with BrainWorks) to an Adobe InDesign document created from their current template. In this case we will always have two documents open and will need to have good static references to both of them. The documents don't have to be switching between active and not (just because you can't see a document doesn't mean that you can't be manipulating it) but even if both are visible, only one will ever be the .activeDocument. You can manipulate things in a document that isn't the active document or the first document.

Good document references allow your scripts to work reliably. And with automation, reliability is more important than raw speed.

2009-06-29

Read Tab Delimited Text File

Document, Page, and Spread References of Frame

//
function docPageSpreadOfFrame (theFrame) {
//-------------------------------------------------------------------------
//-- D O C P A G E S P R E A D O F F R A M E
//-------------------------------------------------------------------------
//-- Generic: Yes, for Adobe InDesign. Tested with CS3 but should work
//-- will CS2 through CS4
//-------------------------------------------------------------------------
//-- Purpose: To return a reference to the document, page, and spread
//-- for the passed frame.
//-- The need for referencing a page and a spread is because an item
//-- on the pasteboard won't be on a page. If you are creating new
//-- ojbects nearby, you need to know where the object is.
//-------------------------------------------------------------------------
//-- Returns: An object of 4 properties:
//-- objectDoc: The Document Object for the frame
//-- objectPageNum: The page number from the front of the document
//-- objectPageRef: A referece to that page
//- objectSpreadRef: A reference to the spread of the object.
//-------------------------------------------------------------------------
//-- Calls: Nothing.
//-------------------------------------------------------------------------
//-- Written: 2008.08.28 by Jon S. Winters
//-- Edited: 2009.01.21 by Jon S. Winters for version 2n to reutrn object.
//-- © 2009 electronic publishing support. All rights reserved.
//-------------------------------------------------------------------------
//-- How it works:
//-- Assume we have an item on a page. If instead it is on a spread, then
//-- the normal method of finding the document (looking at the paretnt)
//-- will instead find the application. If so, then back down.
//-- Note, we will assume the page the item goes on is the right hand page
//-- of the spread. This is only if the item doesn't appear on a page
//-------------------------------------------------------------------------
//-- Version 2.05 put in controls so that it will return if a frame
//-- was not passed.
//-------------------------------------------------------------------------

if ( '|Group|TextFrame|GraphicLine|Oval|Polygon|Rectangle|'.indexOf ('|' + theFrame.reflect.name + '|' , 0 ) < 0 ) {
alert ( "The function 'docAndPageFromFrame' was not passed a frame." ) ;
return null ;
}
//-- Verify that we do not have an anchored item. 2.05
if (theFrame.parent.reflect.name == 'Character' ) {
theFrame = theFrame.parent.parentStory.textContainers[0] ;
}
//-- Version 2.05 method works when theFrame is part of a group or
//-- even if it is burried deeper.
var objRef = theFrame
var objRefParent = objRef.parent ;
while ( objRefParent.reflect.name != 'Spread' ) {
objRef = objRefParent ;
objRefParent = objRef.parent ;
}
//-- At this point, the objRef is still unknown, but we know the spread details

//-- Keep the spreadReference and build the doc reference.
var spreadRef = objRefParent ;
//-- Get the doc from the frame using the normal method.
var theDocOfTheFrame = spreadRef.parent ;

//-- Now check to see if the obeject before the spread is a page
if (objRef.reflect.name == 'Page' ) {
//--on a page
var pageRef = objRef ;
var thePageNumOfTheFrame = pageRef.documentOffset + 1 ;
}
else {
//-- likely a group or something else
//-- The below 'Page' is fake, but it works in most cases
var pageRef = spreadRef ;
var lastPageOfSpread = spreadRef.pages.item( (spreadRef.pages).length - 1 ) ;
var thePageNumOfTheFrame = lastPageOfSpread.documentOffset + 1
}
//-- Now setup a return object
return {objectDoc:theDocOfTheFrame , objectPageNum:thePageNumOfTheFrame , objectPageRef:pageRef , objectSpreadRef:spreadRef }
} //-- End of Function
//

ExtendScript Support

2009-07-02

Today in History — Sorting Paragraphs with Leading Dates

2009-06-30

Why not just use app.activeDocument

2009-06-29

Read Tab Delimited Text File

Document, Page, and Spread References of Frame

Followers

Blog Archive

About Me