2009-07-02

Today in History — Sorting Paragraphs with Leading Dates

At the site I’m at today, a comment was made about sorting a group of paragraphs from the syndicated Today in History column that is frequently used in newspapers. The way that column is sent across the wire, the lead paragraph has one element form some point in history. But the site I’m at, they want to combine that particular paragraph with the rest of items. But they need to sort them. The problem with them is they have some dates which are B.C. dates and all B.C. dates actually appear before all non B.C. dates. For if the we were to sort these dates in date order they should appear this way:
  • 319 B.C.: Something happened.
  • 57: Something happened in the middle of those dates.
  • 1976: Something else happened.
  • 2001: The last important thing happened.
If these strings are placed in a normal ECMA Script, JavaScript, or ExtendScript array, and sorted with the standard .sort() method, they would get sorted as string and alphabetically they would sort into this order:
  • 1976: Something else happened.
  • 2001: The last important thing happened.
  • 319 B.C.: Something happened.
  • 57: Something happened in the middle of those dates.
And, while that is sorted alphabetically, it isn't sorted numerically, let alone sorted by date.
If we could grab the first word, or first number, we could then sort it numerically by converting the first word to a number and sorting numerically. That is actually quite easy. But if we did that, then this list would be sorted in this order:
    • 57: Something happened in the middle of those dates.
    • 319 B.C.: Something happened.
    • 1976: Something else happened.
  • 2001: The last important thing happened.
  • Here again, this isn't quite right. The 319 is a B.C. date. Thus, it should appear first, not second.

    To solve the problem, I created a user sort function to change the way the .sort() method works. The function is below...

    var unsorted = String ( '1964: something\r350 B.C.: more text\r1756: text.\r319 bc: stuff\r1500 no colon').split('\r') ;
    var sortedWrong = String ( '1964: something\r350 B.C.: more text\r1756: text.\r319 bc: stuff\r1500 no colon').split('\r').sort() ;
    var sortedCorrectly = String ( '1964: something\r350 B.C.: more text\r1756: text.\r319 bc: stuff\r1500 no colon').split('\r').sort(sortStringsWithLeadingDate) ;

    //
    function sortStringsWithLeadingDate (arg1,arg2) {
    //-------------------------------------------------------------------------
    //-- S O R T S T R I N G S W I T H L E A D I N G D A T E
    //-------------------------------------------------------------------------
    //-- Generic: Yes. ECMA Script, JavaScript, ExtendScript
    //-------------------------------------------------------------------------
    //-- Purpose: This is a helper function to sort the elements of an aray
    //-- such those created from an array of paragraphs from stories such
    //-- as 'Today in History' -- a column frequently run in newspapers.
    //-- That text has dates that are followed by a colon, but sometimes
    //-- those dates have B.C. BC dates should sort before the non
    //-- BC dates. This custom function helps do that.
    //-------------------------------------------------------------------------
    //-- Returns: The standard return for a user sort function. 0 if the
    //-- first argument is the same as the second function. A negative
    //-- number if the 2nd number is greater (or sorts after) the 1st
    //-- number. A positive number if the opposite is true.
    //-------------------------------------------------------------------------
    //-- Calls: an internal function: determineDate which attempts to
    //-- return a Number for the date. BC dates are returned as a
    //-- negative number, positive dates as positive dates. And when no
    //-- date can be determine, the original string will be passed back.
    //-------------------------------------------------------------------------
    //-- Standard use: Assume that you have an array of strings that looks
    //-- similar to the following:
    //-- var dts = [1964: something happened\r','350 B.C.: something happened a long time ago\r','1756: Something fundamental happened.\r']
    //-- dts.sort(sortStringsWithLeadingDate)
    //-- // now the dts array will be sorted by date.
    //-------------------------------------------------------------------------
    //-- Written by Jon S. Winters of electronic publishing support
    //-- eps@electronicpublishingsupport.com
    //-- Written: 2009.07.02 in Newark, New Jersey
    //-------------------------------------------------------------------------
    //-- Use the internal function to attempt to get a number for the
    //-- lead date of each argument passed to the this user sort function
    var arg1Date = determineDate ( arg1 ) ;
    var arg2Date = determineDate ( arg2 ) ;
    //-- If those two things did return dates, then the standard method
    //-- of sorting has the only needs to return the difference between
    //-- the second number and first number.
    var subResults = arg1Date - arg2Date ;
    if ( ! isNaN ( subResults ) ) {
    //-- if that value of subtracting is Not a Number we can't use
    //-- the results.
    return subResults ;
    }
    //-- if here then the results wasn't a number, so try an alphabetical
    //-- comparison.
    if ( arg1Date == arg2Date ) {
    return 0 ;
    }
    else if ( arg1Date <>
    return -1 ;
    }
    return 1 ;
    //-- end of main function
    function determineDate ( arg ) {
    //---------------------------------------------------------------------
    //-- D E T E R M I N E D A T E
    //---------------------------------------------------------------------
    //-- Generic: in concept, but very specific to the text to examine.
    //---------------------------------------------------------------------
    try {
    //-- Using Regular Expressions, find all the text upto the first :
    //-- If that isn't found the result can generate an error.
    var lead = arg.match( RegExp ( '^(.+?):' ))[1] ;
    //-- From that lead text, find the date of it which is a continue
    //-- string of numbers
    var year = lead.match( RegExp ( '\\d+' ) )[0] ;
    //-- Now determine if that lead text also has a 'b' and a 'c' in
    //-- that order. If so, return a negative value, if not, return
    //-- a numeric version.
    if ( new RegExp ( 'b.{1,3}c', 'i' ).test( lead ) ) {
    return -year ;
    }
    return Number ( year ) ;
    }
    //-- in case any of that caused an error, return the the original value
    catch ( err ) {
    return arg ;
    }
    }//-- end of internal function
    }
    //

    No comments:

    Post a Comment