Round pegs into square holes: data warehouses for the hardware impaired

ROUND PEGS INTO SQUARE HOLES:
DATA WAREHOUSES FOR THE HARDWARE IMPAIRED
Michael Davis, Bassett Consulting Services, Inc., North Haven, Connecticut Abstract
Third, looking towards the future, Bassett wanted to One of the dirty secrets of most data warehouse position the applications it developed so that when projects is that they require vast amounts of disk Version 7 is released, SAS variable names up to 32 drive space and robust computers for successful characters long can be accommodated without re- coding and recompiling. Similarly, when it was organization that lusts for a comprehensive data necessary to create time-series data sets, we warehouse can finance the tab for the hardware that wanted to be able to look-up a five character may be required to create the warehouse of their variable prefix easily. The data dictionary approach Bassett Consulting Services has helped two clients Fourth, it was clear that no matter how well any data make better use of limited computing resources warehouse might be designed, there would always through a two step approach. First, an active data be some essential information not catalogued in the dictionary permits "point and click" selection, data warehouse. Thus a data consolidation renaming, joining, and sub-setting of data, even structure should be created that would allow our clients to integrate data from SAS data sets, cartridges or in hierarchical files.
RDBMSs, and flat-files stored on disk or tape intotables for subsequent analysis and reporting.
Then "code engines" create SAS code on the fly toextract the requested data from their actual physical This was the impetus for developing the Meta- locations and transform the data into SAS data sets.
Master utilities and the Data Integrator These data sets even contain variable labels and interfaces. The Meta-Master utilities consist of a formats, courtesy of the data dictionary, and are data dictionary and SAS/AF applications designed ready for further analytical processing.
to populate the dictionary, maintain the dictionary,and to define objects called variable lists.
Introduction
When Bassett first began creating SAS/AF® FRAME relationship between Data Integrator and Meta- applications, our custom was to incorporate SAS Master. Underneath the user interface layer of Data data step program code through SCL SUBMIT Integrator are the Meta-Master data dictionary, the blocks, which were compiled along with the rest of Meta-Master utilities, and the Data Integrator code engines. The data dictionary is where the details for dissatisfied with this approach for several reasons.
each data element are stored. The Meta-Masterutilities are used to automatically populate the data First, our clients were asking for "point and click" selection of variables. Within the programmingstructures that we used, it was not possible to The core of Data Integrator is composed of "code comply with their requests without custom engines". These engines write SAS code to extract, programming for each new data source.
sort, combine, and manipulate data as specified by Compounding the problem was the realization that lists called "job streams". These job streams are the data steps incorporated into our applications specified to through Data Integrator's "point and involved much repetitive code. Bassett wanted to click" interface, made possible by the Meta-Master employ an approach that would facilitate the adoption of object-orient programming techniques.
To better understand how Meta-Master and Data Second, it seemed that the SAS System's best Integrator work, let's examine how variable lists and features for developing "point and click" interfaces required that the data to be in a SAS data set orrelational database management system (RDBMS)and stored on disk. Unfortunately, our clients couldnot afford to devote the disk space necessary topermanently store their data warehouses on disk.
one for each month. Rather than having toredundantly define the data elements in the variousclaims tables for each month, we can use a single SAS/AF FRAME Interface
libref (and table) to represent a typical month.
Then, we merely have to tell Data Integrator for Data Dictionary
which month (and year) we desire our data.
Our use of term "table" is similar to the SAS data set Meta-Master Utilities
member. We decided to refer to data set membersas tables for two reasons. First, our definition oftables should be viewed as logical or symbolic, Code Engines
while SAS data sets members are but one form ofphysical representation. Second, we wanted toreserve the possibility that we might extend the Meta-Master utilities in the future to allow multipletables to point back to a single SAS data set, as is Variable Lists
currently permitted with flat-files.
For the Meta-Master data dictionary and utilities to The term "variable" refers to a single data element properly function, each data element must have a or SAS variable. The combination of libref, table, unique identification. Meta-Master uses the SAS and variable points to a unique data element in our System two-level convention (<libref> .<member>)
system of categorization. However, during testing to specify to which table data element (variable) of the Meta-Master prototype on a rather large table belongs. This is illustrated in Figure 2. Since the with several thousand variables, it was decided to way in which Meta-Master employs this form of provide two additional means to sub-categorize categorization has some unusual features. Let us A variable "category" classifies similar variables forease of selection. Where variables aredistinguished only by the time period each variablerepresents or its position in a series, an "indexvalue" is assigned to it in the data dictionary. It is Data Dictionary
important to note that the use of categories andindex values in the Meta-Master system is optional and that categories and index values are only used • TA AR1
to subset the number of variables shown for a table Variable List Structure
The heart of the variable list is a list of the variablesto be included in a particular job step. When a variable list is created, it is stored in a SAS catalogas an SLIST entry. When a variable list is created,the person creating the list is encouraged to supply SAS users may be already familiar with the term a narrative description to be stored with the SLIST libref. A libref is an alias which tells the SAS catalog entry. The structure of the variable list is System the physical location of the library where it can find a specified data set member. However,since the Data Integrator system is designed to Although all of the code engines created for the reference flat-files in a manner that is largely Data Integrator thus far handle only one libref and transparent to its users, Meta-Master extends the table per variable list, Meta-Master stores the libref definition of libref to include a single flat-file or and table for each variable in the event that it becomes desirable to relax this restriction in thefuture.
As an example, let us suppose that the Meta-Masterroutines were to be employed by a health insurer.
We might have a series of patient claims data sets, VARIABLE LIST (SLIST CATALOG ENTRY)
Variables
DROP= (dropped variables list)
JOB STREAM LIST (SLIST CATALOG ENTRY)
RENAME= (old variables=new variables)
JOB STEP= INFILE
WHERE=(WHERE clauses)
JOB STEP= SORT
Other information is stored along with the list ofvariables for those job steps that may require thatinformation. These pieces of information are used JOB STEP= MERGE
For example, if a job step requires that someincoming variables be renamed to prevent "collisions", a list of original and renamed variablescan be stored in the variable list. Variables that are Some of the code engines execute on the desktop not required for the current step can be dropped. If computer on which Data Integrator runs. Other WHERE processing is required, it can be applied to code engines take advantage of the SAS System's only the data sets that require it and before the ability to distribute program logic and submit code to current observation is brought into the job step.
be executed on a remote computer, such as anMVS mainframe.
Variable lists are managed by the Meta-Masterutilities and routines. These list are referenced by What do the different engines do? The INFILE the Data Integrator application, which invokes engine writes a data step to extract a list of variables from a disk or tape flat-file. The SET written in SCL (Screen Control Language). Data engine brings in one or more SAS data sets for sub- Integrator uses Job Stream lists to combine these setting and transformation. The MERGE engine elements and accomplish data extraction and brings in two or more data sets and combines them using BY variables as the merge key.
Job Streams Lists
The SORT engine does what one expects. It sorts asingle SAS data set. However, one can use this Each job stream list consists of at least one step and engine to rename or drop variables and to subset can have as many steps as desired. Job streams the observations copied to the output data set. The are stored as nested lists, in job step sequence, output data set need not be the input data set.
within a SLIST catalog entry. The structure of atypical job streams is illustrated in Figure 4.
The TEMPLATE engine was developed toaccommodate those situations where complex data Data Integrator processes a job stream one step at a step logic was required. Rather than develop a time. The type of step is specified by the list name convoluted menu structure to handle all potential under which the job step was stored. At the time situations, the user keys a program template. For this paper was prepared, Bassett had created code some applications, the template is inserted exactly engines to write SAS statements and other types of as keyed. However, the TEMPLATE engine shines code for the following types of job steps: when Data Integrator uses the data dictionary to "fillin the blanks". The user merely keys in the wordINPUT on its own line and lists the variables to be included in the data step. Data Integrator looks up As shown in the preceding example, Data Integrator the variable offsets and informats required to fill out the INPUT statement. Thus a piece of program statements, using information stored in the data developed for a client which routinely stores needed data on an MVS mainframe. After using DataIntegrator to create and remotely submit a MVS batch job to extract the data, the COPY engine submits a remote SAS program to copy the extract Once the data has been copied to disk, theDOWNLOAD engine downloads the data to the Interface to PC File Formats converts thedownloaded data to a Microsoft Excel.XLS Using the same technology, other data step processing or procedures can be accommodated.
Further, jobs to be remotely submitted to any platform supported by the SAS System, including operating system and utility program statements, can be cloned from existing engines.
Additional Job Step Parameters
What kinds of additional information do the code engines require? It depends on the code engine. As an example, let us consider the SET code engine when the job is to be both created and run on the desktop platform. Five items must be specified: The output libref and output member are inserted into the DATA statement created by the SET code engine. The drop list allows variables drawn from input SAS data sets only for use by a sub-setting IF statement to be dropped at the completion of the A SET job step may contain more than one variable list since a SET statement can specify more thanone SAS data set to be incorporated into a datastep. Data Integrator loops through all therequested SAS data sets as it builds the SETstatement. The data set options specified withineach variable list are inserted in the SET statementafter SAS each data set.
Data Dictionary Features
Data Integrator's code engines become smartenough to remember the desired processing logic One feature of the Meta-Master Data Dictionary is whenever that SAS variable is selected.
the ability to store the data type for each element.
Data Type can assume the following values: With respect to the other fields shown on the screenshown in Figure 6, the Informat, Format, and Length fields are familiar to SAS users and inherit their features. Offset is the starting position of the field to be read to create the SAS variable. It is required when reading data from a tape or disk flat-file.
Libref, Table and Variable Name have been Autoloader
discussed previously. Variable Type can be eithercharacter or numeric. Label is a forty character The Meta-Master utilities would be far less useful if variable which contains the SAS variable label.
it were necessary to key in the dictionaryinformation by hand. As observed in a preceding Transformation is a unique feature of the Meta- discussion, some of our clients routinely work with Master Data Dictionary. For example, a tape data tables containing thousands of variables.
set might have a transaction date, representing a Fortunately, one of the Meta-Master utilities is a Julian date, that is stored as a packed decimal. In FRAME application that automatically loads the data our applications, we want to use it as a SAS date How is this done? For almost any table we might When the code engines used to extract data are insert into the data dictionary, we have one of three run, they inspect the value of the Tranform(ation) items. We might have a copy of the data set or one data dictionary field. If it is not empty, Data with a similar structure. When this is not the case, Integrator creates an assignment statement which we almost surely have a SAS program with an incorporates the transformation. In our example, the INPUT statement coded to read the raw data.
field on the tape might be read using the PD3.
Informat. The appropriate transformation would be Last, if the raw data was created by a productionprogram, we might have access to a COBOL record description for the raw data file. In this case, weemploy the COB2SAS utilities available at no cost and Data Integrator would build the following from SAS Institute's Technical Support department.
assignment statement in the created data step: COB2SAS is used to translate the record descriptioninto SAS INPUT and LABEL statements, which are While transformations were incorporated into Meta- How does the Autoloader work? In all cases, Master for the purpose just described, they can be Meta-Master reads the SAS System dictionary used to solve other problems. These uses include views to determine the table layout and to populate summing variables across an observation and the Meta-Master data dictionary. In the case where counting observations either conditionally or a table is drawn from a SAS data set or view, all of the required information can be obtained from theSAS System dictionary views.
To illustrate the creation of a true/false variable, ifwe had entered a transform for a numeric variable, In the case of tape and disk flat-files, two additional items must be obtained. These two items are thevariable offsets and informats. This information is obtained from parsing the INPUT statement.
where company is a character value containing the To use the Autoloader to define a table created from names of companies, testco would be true when a flat-file, the user enters the name of the file where company contained the word "BASSETT".
the SAS program containing an INPUT statementfor the file is located. An editor window is opened The key thing to remember about transformations is and the file is loaded. The user deletes the DATA that once they are entered into the data dictionary, and INPUT statements since the Autoloader supplies them. Other statements, such as LABEL, the tape data sets and execution time, the code engine also inserts a SyncSort® extraction program In addition to creating the SAS system dictionary As part of this customized version of Data views and parsing the INPUT statement for variable Integrator, the client was furnished with a utility offsets and informats, the Autoloader identifies FRAME application that catalogs which clients are appropiate variable formats based on the informats stored on each tape cartridge of selected tape flat- files. This information is used by the SyncSortextraction program so that only the cartridges Building Job Streams
containing the selected clients information aremounted and read.
The user clicks on a radio box station to select thenext job step to be built. Clicking on the REVIEW Conclusion
job stream station prints a detailed list of the jobstream under construction to the SAS log. If the The Data Integrator and Meta-Master represent an wrong job step is selected, clicking on the UNDO object-oriented approach using SAS System last step button backs out the last step entered.
software to rapidly develop custom data extractionand manipulation applications. These applications To keep users from wondering if the job step they may be used to populate data warehouses or for just entered was accepted, a "stack" displays the job steps that have been previously entered.
Since the analysis data does not need to be retained Running Job Streams
on disk storage, this approach should be attractiveto SAS System users who may be working under One feature of this interface is the librefs and hardware or budget constraints yet do not want to filerefs associated with the selected job stream are sacrifice the "point and click" interfaces possible verified before the job stream is actually executed.
If a libref or fileref is unassigned, a pop-up windowallows the user to make the assignment or cancel An ancillary benefit of this approach is that a data execution before any time is wasted running the job dictionary is created. This dictionary is a good tool for managing a data warehouse. Because thedictionary is stored as a SAS data set, it is relatively Custom Interface for Remote MVS Batch Jobs
simple to interface it to other SAS applications.
For one client, Bassett created a custom interface Acknowledgments
and code engines to meet their requirements. Thedata they wished to extract was stored in tape cartridges on an MVS mainframe. The data sets are registered trademarks of SAS Institute, Cary were very large, typically occupying 50 or more IBM 3480 tape cartridges. Their analysts needed theability to extract data for a single client and SyncSort is a registered trademark of Syncsort Inc., download it for analysis with a Microsoft Excel Data Integrator, Meta-Master, and Data Builder are When the client was shown the a prototype Data trademarks of Bassett Consulting Services, Inc.
Integrator interface that allowed them to define andstore job stream lists, their reaction was that the interface was needlessly complex and confusing.
To satisfy their concern, we modified the Data Integrator interface to build a disposable single step job stream list. Thus users at this site only have to The INFILE code engine was modified to submit the extraction program in batch mode, along with therequired MVS JCL. To minimize both contention for

Source: http://www.bassettconsulting.com/PAPER117.PDF

Plac1402_01-80

New Amsterdam Theatre, New Victory Theater The restoration of two historic theaters not only jump-starts Times Square’s recent revival but also enables theater- goers to revel in the area’s complex architectural history. Top: New Amsterdam Theatre Above: The new theatres, signage and interior detail which face each other across reflect different er

La science végétale

DISPONIBLES DANS VOTRE SALON LE REFLET CHALONNESPROGRAMME CHEVEUX FINS ET SANS VOLUMEGalbe et volume aérien…Les cheveux fins manquent souvent de volume et de tonus. Difficiles à coiffer, ils n’ont aucune tenue. Vos cheveux ont besoin d’être gainés et hydratés. Ils retrouvent ainsi peu à peu la texture qui donne à votre coiffure un volume aérien, un galbe merveilleux, de la racine au

Copyright © 2018 Predicting Disease Pdf