Saturday, 19 September 2015

Learning SAS

This blog contains some basic SAS commands and their application, along with the output . The problems have been picked up from the book "Learning SAS by Example: A Programmer's Guide"  - Chapter 7 onwards. All the programs have been saved in the library a16. Let's begin..


*Setting up the library

libname a16 "/folders/myshortcuts/myfolder/iSAS/a15016";


Chapter 7 - Problems 1;

data a16.a16_school;   
 length    Quiz $ 1;
 input Age Quiz $ Midterm Final; 
  
 *Using IF and ELSE IF statements, compute two new variables as follows: 
 Grade (numeric), with a value of 6 if Age is 12 and a value of 8 if Age is 13;
  if Age = 12 then Grade = 6;
 else if Age = 13 then Grade = 8;
 *The quiz grades have numerical equivalents as follows: A = 95, B = 85, C = 75,  D = 70, and F = 65;
 if Quiz = 'A' then Quiz_eq = 95;
 else if Quiz = 'B' then Quiz_eq = 85;
 else if Quiz = 'C' then Quiz_eq = 75;
 else if Quiz = 'D' then Quiz_eq = 70;
 else if Quiz = 'F' then quiz_eq = 65;
 *Compute a course grade (Course) as a weighted average of the 
 Quiz (20%), Midterm (30%) and Final (50%);
 Course = (.2*Quiz_eq + .3*Midterm + .5*Final);
  
 datalines;   
 12 A 92 95   
 12 B 88 88   
 13 C 78 75   
 13 A 92 93   
 12 F 55 62   
 13 B 88 82   
 ;  
 proc print data = a16.a16_school;
 run;


Output:


Learning:
Use of 'if then else' and computing a new variable.


Chapter 7 - Problems 2 ;
*Creating Data set HOSP;
 libname a16 "/folders/myshortcuts/myfolder/iSAS/a15016";
 data a16.a16_hosp;
   do j = 1 to 1000;
      AdmitDate = int(ranuni(1234)*1200 + 15500);
      quarter = intck('qtr','01jan2002'd,AdmitDate);
      do i = 1 to quarter;
         if ranuni(0) lt .1 and weekday(AdmitDate) eq 1 then
            AdmitDate = AdmitDate + 1;
         if ranuni(0) lt .1 and weekday(AdmitDate) eq 7 then
            AdmitDate = AdmitDate - int(3*ranuni(0) + 1);
         DOB = int(25000*Ranuni(0) + '01jan1920'd);
         DischrDate = AdmitDate + abs(10*rannor(0) + 1);
         Subject + 1;
         output;
      end;
   end;
   drop i j;
   format AdmitDate DOB DischrDate mmddyy10.;
run;

*Use PROC PRINT to list observations for Subject        
values of 5, 100, 150, and 200;

*Using In Operator;
proc print data = a16.a16_hosp;
WHERE Subject in (5 100 150 200);
run;

*Using OR Operator;
proc print data = a16.a16_hosp;
WHERE Subject = 5 OR Subject = 100 OR Subject = 150 OR Subject = 200;
run;

Output: The output remains the same in either case.


Learning
Use of OR and IN operator

Chapter 7 - Problems 4;

*Creating a subset of Sales with Region and TotalSales Variable;
Data a16.Sales_new (keep= Region TotalSales); 
set a16.Sales;
input Weight;

*Adding a new variable called Weight with values of 1.5 for the North Region, 
1.7 for the South Region, and 2.0 for the West and East Regions. Use a SELECT statement to do this;

Select;
  when (Region eq 'North') Weight = 1.5;
  when (Region eq 'South') Weight = 1.7;
  when (Region eq 'West') Weight = 2.0;
  when (Region eq 'East') Weight = 2.0;
otherwise;
end;
proc print data = a16.Sales_new;
run;

Output:


Learning
Use of SELECT statement

Chapter 8 - Problem 1;

*Creating a dataset named Vital;

data a16.a16_vitals;       
input ID : 3. Age Pulse SBP DBP;       
label SBP = "Systolic Blood Pressure"             
 DBP = "Diastolic Blood Pressure";    

datalines;    
001 23 68 120 80    
002 55 72 188 96    
003 78 82 200 100    
004 18 58 110 70    
005 43 52 120 82    
006 37 74 150 98    
007  . 82 140 100    
;  
proc print data = a16.a16_vitals;
run;

data a16.a16_newvitals;
set a16.a16_vitals;
if Age lt 50 then do;
if Pulse lt 70 then PulseGroup = 'Low';
else if Pulse ge 70 then PulseGroup = 'High';
If SBP lt 130 then SBPgroup = 'Low';
else if SBP ge 130 then SBPgroup = 'High';
End;

Output:



Chapter 8 - Problem 2;
data a16.a16_monthsales;       
 input month sales;
SumSales + sales;
 datalines;    
 1 4000 
 2 5000 
 3 . 
 4 5500 
 5 5000 
 6 6000 
 7 6500 
 8 4500    
 9 5100 
 10 5700 
 11 6500

 12 7500    
 ;
run;
proc print data = a16.a16_monthsales;
run;

Output:


Learning
Use of SUM Statement


Chapter 8 - Problem 4;
data a16.a16_missing;
input A $ B $ C $;
MissA = missing(A);
MissB = missing(B);
MissC = missing(C);
TotalMiss = MissA+MissB+MissC;
datalines;
X Y Z
X Y Y
Z Z Z
X X .
Y Z .
X . .
;
run;
proc print data = a16.a16_missing;
run;

Output:


Learning
USe of Missing function to count the number of missing values in a variable

Chapter 9 - Problem 1;

data a16.a16_dates;
input @1 subject $3.
@4 dob mmddyy10.
@14 visit date9.;
age=yrdif(dob,visit);
datalines;
00110/21/195011Nov2006
00201/02/195525May2005
00312/25/200525Dec2006
;
run;
proc print data = a16.a16_dates;

Output


Learning
Use of the Yrdif() function to find the difference between two years. Helpful in calculating the age sometimes.

Chapter 9 - Problem 2;
data a16.a16_threedates;
input @1 date1960_2006 mmddyy8.;
format date1960_2006 date9.;
datalines;
01/01/11
02/23/05
03/15/15
05/09/06
;
proc print data=a16.a16_threedates;
run;

Output




Chapter 9 - Problem 4;
data a16.a16_hosp;
do j = 1 to 1000;
AdmitDate = int(ranuni(1234)*1200 + 15500);
quarter = intck('qtr','01jan2002'd,AdmitDate);
do i = 1 to quarter;
if ranuni(0) lt .1 and weekday(AdmitDate) eq 1 then
AdmitDate = AdmitDate + 1;
if ranuni(0) lt .1 and weekday(AdmitDate) eq 7 then
AdmitDate = AdmitDate - int(3*ranuni(0) + 1);
DOB = int(25000*Ranuni(0) + '01jan1920'd);
DischrDate = AdmitDate + abs(10*rannor(0) + 1);
Subject + 1;
output;
end;
end;
drop i j;
format AdmitDate DOB DischrDate mmddyy10.;
run;
proc print data=a16.a16_hosp (obs=10);
run;

Output
















Learning
Use of Ranuni( ) function to generate random numbers between 0 and 1

Chapter 10 - Problem 1;

data a16.a16_blood;
infile "/folders/myshortcuts/myfolder/iSAS/Datasets/blood.txt" truncover;
input Subject Gender $ BloodType $ AgeGroup $ WBC RBC Chol;
proc print data = a16.a16_blood (obs=5);
run;

Output


data a16.a16_subsetA;
set a16.a16_blood;
where Gender = "Female" and BloodType = "AB";
Combined = (0.001*WBC)+ RBC;
run;
proc print data = a16.a16_subsetA (obs=5);
run;

Output



data a16.a16_subsetB;
set a16.a16_blood;
where Gender = "Female" and BloodType = "AB" and (0.001*WBC)+ RBC ge 14;
Combined = (0.001*WBC)+ RBC;
run;
proc print data = a16.a16_subsetB(obs=5);
run; 

Output


Learning
Learnt how to create subsets using SET

Chapter 10 - Problem 2;

data a16.a16_monday2002;
set a16.a16_hosp;
day = weekday(AdmitDate);
year = year(AdmitDate);
age = yrdif(DOB, AdmitDate);
age = round(age);
if day ="2" and year = "2002";
run;
proc print data = a16.a16_monday2002;
run;

Output



Learning
Use if weekday( ), year( ) to extract days and year from a date and round( ) to round off a number

Chapter 10 - Problem 4;

data a16.a16_bicycles;
   input Country  & $25.
         Model    & $14.
         Manuf    : $10.
         Units    :   5.
         UnitCost :  comma8.;
   TotalSales = (Units * UnitCost) / 1000;
   format UnitCost TotalSales dollar10.;
   label TotalSales = "Sales in Thousands"
         Manuf = "Manufacturer";
datalines;
USA  Road Bike  Trek 5000 $2,200
USA  Road Bike  Cannondale 2000 $2,100
USA  Mountain Bike  Trek 6000 $1,200
USA  Mountain Bike  Cannondale 4000 $2,700
USA  Hybrid  Trek 4500 $650
France  Road Bike  Trek 3400 $2,500
France  Road Bike  Cannondale 900 $3,700
France  Mountain Bike  Trek 5600 $1,300
France  Mountain Bike  Cannondale  800 $1,899
France  Hybrid  Trek 1100 $540
United Kingdom  Road Bike  Trek 2444 $2,100
United Kingdom  Road Bike  Cannondale  1200 $2,123
United Kingdom  Hybrid  Trek 800 $490
United Kingdom  Hybrid  Cannondale 500 $880
United Kingdom  Mountain Bike  Trek 1211 $1,121
Italy  Hybrid  Trek 700 $690
Italy  Road Bike  Trek 4500  $2,890
Italy  Mountain Bike  Trek 3400  $1,877
;
run;
proc print data = a16.a16_bicycles (obs = 10);
run;

Output


data a16.a16_mountainUSA a16.a16_roadFrance;
set a16.a16_bicycles;
if Country eq "USA" and Model eq "Mountain Bike"
then output a16.a16_mountainUSA;
if Country eq "France" and Model eq "Road Bike"
then output a16.a16_roadFrance;
run;

proc print data = a16.a16_mountainUSA;
run;

Output



proc print data = a16.a16_roadFrance;
run;

Output


Chapter 11 - Problem 1;

data a16.a16_health;
   input Subj : $3.
         Height
         Weight;
datalines;
001 68 155
003 74 250
004 63 110
005 60 95
;
run;

data a16.a16_bmi (drop  = height weight);
set a16.a16_health;
weight_kg=height/2.2;
height_mts=height*.0254;
bmi=weight_kg/height_mts;
bmi_round=round(bmi,1);
bmi_tenth=round(bmi,.1);
bmi_group=round(bmi,5);
bmi_trunc=int(bmi);
;
run;
title "BMI";
proc print data=a16.a16_bmi;
run;

Output


Chapter 11 - Problem 2;

data a16.a16_miss;
set a16.a16_blood;
if missing(wbc) then missWBC+1;
if missing(rbc) then missRBC+1;
if missing(chol) then missCHOL+1;
run;
proc print data=a16.a16_miss (obs=10);
run;

Output


Chapter 11- Problem 3

data a16.a16_blood_miss;
set a16.a16_blood;
if missing(wbc) then call missing(age, rbc, chol);
run;
title "Missing Value";
proc print data=a16.a16_blood_miss (obs = 10);
run;

Output



Chapter 12 - Problem2;

*Data set MIXED;
data a16.a16_mixed;
   input Name & $20. ID;
datalines;
Daniel Fields  123
Patrice Helms  233
Thomas chien  998
;

data a16.a16_mixed1(drop = first last first1 first2 last1 last2 f l);
set a16.a16_mixed;
name_low=lowcase(name);
name_prop=propcase(name);
first=scan(name,1,' ');
last=scan(name,2,' ');
first1=upcase(substr(first,1,1));
first2=lowcase(substr(first,2));
f=trim(first1)||trim(first2);
last1=upcase(substr(last,1,1));
last2=lowcase(substr(last,2));
l=trim(last1)||trim(last2);
name_hard=catx(' ',f,l);
run;
title "Names";

proc print data=a16.a16_mixed1;
run;

Output

'
'
Learning
Use of lowcase( ) ad propcase( ) t0 convert letters to lowercase and proper case respectively


Chapter 12 - Problem 3;

*Data set NAMES_AND_MORE;
data a16.a16_names_and_more;
   input Name $20.
         Phone & $14.
         Height & $10.
         Mixed & $8.;
datalines;
Roger   Cody        (908)782-1234  5ft. 10in.  50 1/8
Thomas  Jefferson   (315) 848-8484  6ft. 1in.  23 1/2
Marco Polo          (800)123-4567  5Ft. 6in.  40
Brian Watson        (518)355-1766  5ft. 10in  89 3/4
Michael DeMarco     (445)232-2233  6ft.       76 1/3
;
proc print data=a16.a16_names_and_more;
run;

Output


data a16.a16_name_number;
set a16.a16_names_and_more;
name=compbl(propcase(name));
phone=compress(phone,'( ) -');
phone=input(phone,10.);
run;
title "Names and More";

proc print data=a16.a16_name_number;
run;

Output


Learning
Use of COMPBL( ) and COMPRESS( ) function. The former converts two or more blanks to a single blank; the latter removes blanks (default action) or characters that you specify from a character value 

Chapter 12 - Problem 4;

data a16.a16_height (drop = h1 h2 h3);
set a16.a16_names_and_more;

h1=compress(scan(height,1,' '),' ','kd');
h1=input(h1,10.);
h2=compress(scan(height,2,' '),' ','kd');
h2=input(h2,10.);
h3=h1*12;
Height_New=sum(h3,h2);
run;
title "Names and More";

proc print data=a16.a16_height;
run;

Output



Chapter 13 - Problem 2;

data a16.a16_survey2;
   input ID 
         (Q1-Q5)(1.);
datalines;
535 13542
012 55443
723 21211
007 35142
;
run;
data a16.a16_survey2_new;
set a16.a16_survey2;
array ques{5} Q1-Q5;
do over ques
ques=5-ques;
end;
run;

proc print data=a16.a16_survey2;
run;

Output



Chapter 13 - Problem 3;

*Data set NINES;
data a16.a16_nines;
   infile datalines missover;
   input x y z (Char1-Char3)(:$1.) a1-a5;
datalines;
1 2 3 a b c 99 88 77 66 55
2 999 999 d c e 999 7 999
10 20 999 b b b 999 999 999 33 44
;

proc print data=a16.a16_nines;
run;

Output



data a16.a16_nonines;
set a16.a16_nines;
array n{*} _numeric_;
do i=1 to dim(n);
if n{i}=999 then call missing(n{i});
end;
drop i;
run;
title "No Nines";

proc print data=a16.a16_nonines;
run;

Output




Chapter 13 - Problem 5;

data a16.a16_pass;
   infile datalines missover;
   input id test1-test5;
datalines;
001 90 88 92 95 90
002 64 64 77 72 71
003 68 69 80 75 70
004 88 77 66 77 67
;
run;

proc print data=a16.a16_pass;
run;
data a16.a16_test;
set a16.a16_pass;
array pass{5} test1-test5;
do i=1 to 5;
if pass{i}>65 ;

Output



Chapter 14 - Problem 1;

title "Blood and Subject";
proc print data=a16.a16_blood (obs=10)label;
label wbc="WHITE BLOOD CELL"
RBC="RED BLOOD CELL"
chol="CHOLESTROL";
id subject;
run;

Output




Chapter 14 - Question 2;

*Creating Data set SALES;

data a16.a16_sales;
   input    EmpID     :       $4. 
            Name      &      $15.
            Region    :       $5.
            Customer  &      $18.
            Date      : mmddyy10.
            Item      :       $8.
            Quantity  :        5.
            UnitCost  :  dollar9.;
   TotalSales = Quantity * UnitCost;
/*   format date mmddyy10. UnitCost TotalSales dollar9.;*/
   drop Date;
datalines;
1843 George Smith  North Barco Corporation  10/10/2006 144L 50 $8.99
1843 George Smith  South Cost Cutter's  10/11/2006 122 100 $5.99
1843 George Smith  North Minimart Inc.  10/11/2006 188S 3 $5,199
1843 George Smith  North Barco Corporation  10/15/2006 908X 1 $5,129
1843 George Smith  South Ely Corp.  10/15/2006 122L 10 $29.95
0177 Glenda Johnson  East Food Unlimited  9/1/2006 188X 100 $6.99
0177 Glenda Johnson  East Shop and Drop  9/2/2006 144L 100 $8.99
1843 George Smith  South Cost Cutter's  10/18/2006 855W 1 $9,109
9888 Sharon Lu  West Cost Cutter's  11/14/2006 122 50 $5.99
9888 Sharon Lu  West Pet's are Us  11/15/2006 100W 1000 $1.99
0017 Jason Nguyen  East Roger's Spirits  11/15/2006 122L 500 $39.99
0017 Jason Nguyen  South Spirited Spirits  12/22/2006 407XX 100 $19.95
0177 Glenda Johnson  North Minimart Inc.  12/21/2006 777 5 $10.500
0177 Glenda Johnson  East Barco Corporation  12/20/2006 733 2 $10,000
1843 George Smith  North Minimart Inc.  11/19/2006 188S 3 $5,199
;
title "Sales";

proc print data=a16.a16_sales;
by Region;
var region quantity TotalSales;
sum Quantity TotalSales;
run;

Output


Chapter 14 -Problem 3;

title"First 10 Observation";
title2 "admitted on september 2004";
title3 "older than 83 years";
proc print data=a16.a16_hosp n ="Number of Patients" label;
where year(admitdate) eq 2004 and 
month(admitdate) eq 9 and
yrdif(dob,admitdate,'actual')ge 83;
id subject;
label dob = "Date of Birth"
admitdate= "Admission date"
dischrdate="Discharge Date";
var dob admitdate dischrdate;
run;

Output


Chapter 15 - Problem 1;

proc report data=a16.a16_blood(obs=5) nowd headline;
column subject wbc rbc;
define wbc/display"White Blood Cell";
define rbc/ display "red blood cell";
run;

Output



Chapter 15 - Problem 2;

proc report data=a16.a16_blood headline nowd;
column gender wbc rbc;
define gender/ group;
define wbc/   analysis mean format=6.3 display"White Blood Cell" ;
define rbc/  analysis mean format=6.3 display "red blood cell" ;
break after gender/ ol summarize skip;
run;
quit;
proc print data=a16.a16_blood;
run;

Output



Chapter 15 - Problem 3;

proc report data=a16.a16_hosp headline;
column subject admitdate dob age;
define admitdate/ display "admission date";
define dob/ display "date of birth";
define age/computed;
compute age;
age=yrdif(dob,admitdate,"actual");
endcomp;
run;

Output












2 comments:

  1. Your blog will look much better, if you:

    1. Make it structured. You can have separate page for each chapter and have an index page, linking the pages (for the chapters).  You can also add links at each chapter to go back to index, previous (if any) and next (if any) pages.
    2. Introduce the chapter (i.e. the page) in your own words.
    3. For each problem, explain the problem you are attempting to solve.
    4. You can have screenshots programs as well. 
    5. For each problem solved, write what you learned by solving the problem.  Even one line is fine for this. 
    6. Write a conclusion or summary for each page.

    Remember to :
    1. Number the problems solved by giving Chapter # and Problem #  (just like you did)
    2. "Beautify" the programs before posting.  There a menu item in SAS studio, which does the beautification for you.  Also, you may like to post the programs, the way you have posted the result.

    It would help, if you can think of this exercise as making your class notes and homework copy, combined.

    I understand, it is lot of work now, given you are trying to do it at eleventh hour.  For the next submission, if time permits, you can give it a try.

    Cheers

    ReplyDelete
  2. Thank for the feedback sir. I shall definitely incorporate all these things in my next submission.

    ReplyDelete