Data Interchange Format Discrepancies in implementations 1 Some implementations (notably those of older Microsoft products) swapped the meaning of VECTORS and TUPLES. Some implementations are insensitive to errors in the dimensions of the table as written in the header and simply use the layout in the DATA section.

Data Interchange Format History 1 DIF was developed by Software Arts, Inc. (the developers of the VisiCalc program) in the early 1980s. The specification was included in many copies of VisiCalc, and published in Byte Magazine. Bob Frankston developed the format, with input from others, including Mitch Kapor, who helped so that it could work with his VisiPlot program. (Mitch later went on to found Lotus and make Lotus 1-2-3 happen.) The specification was copyright 1981.

Data Interchange Format History 1 DIF was a registered trademark of Software Arts Products Corp. (a legal name for Software Arts at the time).

Data Interchange Format Syntax 1 On the other hand, data chunks start with a number pair and the next line is a quoted string or a keyword.

Data Interchange Format Values 1 A value occupies two lines, the first a pair of numbers and the second either a string or a keyword. The first number of the pair indicates type:

Data Interchange Format Values 1 BOT – beginning of tuple (start of row)

Data Interchange Format Values 1 0 – numeric type, value is the second number, the following line is one of these keywords:

Data Interchange Format 1 Data Interchange Format (.dif) is a text file format used to import/export single spreadsheets between spreadsheet programs (OpenOffice.org Calc, Excel, Gnumeric, StarCalc, Lotus 1-2-3, FileMaker, dBase, Framework, Multiplan, etc.). It is also known as "Navy DIF". One limitation is that DIF format cannot handle multiple spreadsheets in a single workbook.

Data Interchange Format Header chunk 1 A header chunk is composed of an identifier line followed by the two lines of a value.

Data Interchange Format Header chunk 1 TABLE - a numeric value follows of the version, the disused second line of the value contains a generator comment

Data Interchange Format Header chunk 1 VECTORS - the number of columns follows as a numeric value

Data Interchange Format Header chunk 1 TUPLES - the number of rows follows as a numeric value

Data Interchange Format Header chunk 1 DATA - after a dummy 0 numeric value, the data for the table follow, each row preceded by a BOT value, the entire table terminated by an EOD value

Data Interchange Format Header chunk 1 The numeric values in header chunks use just an empty string instead of the validity keywords.

Comparison of spreadsheet software Remote data update 1 Some on-line spreadsheets provide remote data update allowing data values to be extracted from other users' spreadsheets even though they may be inactive at the time.

Comparison of spreadsheet software Remote data update 1 Gnumeric (listed below as a desktop spreadsheet) is also used as the "back- end" processor in at least one on-line spreadsheet application (Editgrid).

Computer file Organizing the data in a file 1 Information in a computer file can consist of smaller packets of information (often called "records" or "lines") that are individually different but share some common traits

Computer file Organizing the data in a file 1 The way information is grouped into a file is entirely up to how it is designed. This has led to a plethora of more or less standardized file structures for all imaginable purposes, from the simplest to the most complex. Most computer files are used by computer programs which create, modify or delete the files for their own use on an as-needed basis. The programmers who create the programs decide what files are needed, how they are to be used and (often) their names.

Computer file Organizing the data in a file 1 In some cases, computer programs manipulate files that are made visible to the computer user. For example, in a word-processing program, the user manipulates document files that the user personally names. Although the content of the document file is arranged in a format that the word-processing program understands, the user is able to choose the name and location of the file and provide the bulk of the information (such as words and text) that will be stored in the file.

Computer file Organizing the data in a file 1 Many applications pack all their data files into a single file called archive file, using internal markers to discern the different types of information contained within. The benefits of the archive file are to lower the number of files for easier transfer, to reduce storage usage, or just to organize outdated files. The archive file must often be unpacked before next using.

Microsoft Excel Using external data 1 Excel users can access external data sources via Microsoft Office features such as (for example) .odc connections built with the Office Data Connection file format. Excel files themselves may be updated using a Microsoft supplied ODBC driver.

Microsoft Excel Using external data 1 Excel can accept data in real time through several programming interfaces, which allow it to communicate with many data sources such as Bloomberg and Reuters (through addins such as Power Plus Pro).

Microsoft Excel Using external data 1 DDE : "Dynamic Data Exchange" uses the message passing mechanism in Windows to allow data to flow between Excel and other applications. Although it is easy for users to create such links, programming such links reliably is so difficult that Microsoft, the creators of the system, officially refer to it as "the protocol from hell". In spite of its many issues DDE remains the most common way for data to reach traders in financial markets.

Microsoft Excel Using external data 1 Network DDE Extended the protocol to allow spreadsheets on different computers to exchange data. Given the view above, it is not surprising that in Vista, Microsoft no longer supports the facility.

Microsoft Excel Using external data 1 Real Time Data : RTD although in many ways technically superior to DDE, has been slow to gain acceptance, since it requires non-trivial programming skills, and when first released was neither adequately documented nor supported by the major data vendors.

Microsoft Excel Using external data 1 Alternatively, Microsoft Query provides ODBC-based browsing within Microsoft Excel.

Micah Altman Data curation 1 To yield reliable and comparable results, standard methods of data encoding are needed for data attribution and data citation, and for maximally accurate data verification and replication.

Micah Altman Data curation 1 The citation information they recommended included a unique global identifier, a short character string guaranteed to be unique among all such identifiers, that permanently identifies the data set independent of its location, and a universal numeric fingerprint, a fixed-length string of numbers and characters that summarize all the content in the data set, such that a change in any part of the data would produce a different fingerprint.