## Introduction to Statistical Computing in Clinical Research

**Introduction to Statistical Computing in Clinical Research**Biostatistics 212 Lecture 1**Today...**• Course overview • Course objectives • Course details: grading, homework, etc • Schedule, lecture overview • Where does Stata fit in? • Basic data analysis with Stata • Stata demos • Lab**Course Objectives**• Introduce you to using STATA and Excel for • Data management • Basic statistical and epidemiologic analysis • Turning raw data into presentable tables, figures and other research products • Prepare you for Fall courses • Start analyzing your own data**Course details**Introduction to Statistical Computing - 1 unit Schedule – 7 lectures, 7 lab sessions, on 7 Tuesdays in a row Dates: August 3 – September 14 Lectures 1:15-2:45 Labs 3:00-4:00 All in China Basin, CBL 6702 (+ 6704 for lab) Final Project Due 9/21/10**Course details**Introduction to Statistical Computing Grading: Satisfactory/Unsatisfactory Requirements: -Hand in all six Labs (even if late) -Satisfactory Final Project -80% of total points Reading: Optional**Course Director**Mark Pletcher Teaching Assistants Elizabeth Mileti – Section 1 Raman Khanna – Section 1 David Moskowitz – Section 2 Yvette Wild – Section 2 Lecturers Andy Choi Jennifer Cocohoba Lab Instructor Alan Bostrom Mandana Khalili Course details, cont**Lecture**Extra-full this year! Labs PC vs. Mac (Section 1 and Section 2) All of Section 2 won’t fit into 6704… Course details, cont**Overview of lecture topics**• 1- Introduction to STATA • 2- Do files, log files, and workflow in STATA • 3- Generating variables and manipulating data with STATA • 4- Using Excel • 5- Basic epidemiologic analysis with STATA • 6- Making a figure with STATA/Advanced Programming Topics • 7- Organizing a project, making a table**Overview of labs**• Lab 1 – Load a dataset and analyze it • Lab 2 – Learn how to use do and log files • Lab 3* – Import data from excel, generate new variables and manipulate data, document everything with do and log files. • Lab 4 – Using and creating Excel spreadsheets • Lab 5* – Epidemiologic analysis using Stata • Lab 6 – Making a figure with Stata Last lab session will be dedicated to working on the Final Project * - Labs 3 and 5 are significantly longer and harder than the others**Overview of labs, cont**• Official Lab time is 3:00-4:00, but we will start right after lecture, and you can leave when you are done.**Overview of labs, cont**• Labs are due the following week prior to lecture. Labs turned in late (less than 1 week) will receive only half credit; after that, no points will be awarded. However, ALL labs must be turned in to pass the class (even if no points are awarded). • Lab 1 is paper • Labs 2-6 are electronic files, and should be emailed to your section leader’s course email address: biostat212_section1@yahoo.com (Elizabeth/Raman) or biostat212_section2@yahoo.com (David/Yvette)**Final Project**• Create a Table and a Figure using your own data, document analysis using Stata. • Due 1 week after last lab session, 20 points docked for each 1 day late.**Course Materials**• Online Syllabus (http://rds.epi-ucsf.org/ticr/syllabus/display.asp?academic_year=2010-2011&courseid=38) • Course Overview • Final Project • Miscellaneous handouts • Lectures and Labs/Datasets (“just in time”)**Getting started with STATA**Session 1**Types of software packages used in clinical research**• Statistical analysis packages • Spreadsheets • Database programs • Custom applications • Cost-effectiveness analysis (TreeAge, etc) • Survey analysis (SUDAAN, etc)**Software packages for analyzing data**• STATA • SAS • S-plus, and R • SPS-S • SUDAAN • Epi-Info • JMP • MatLab • StatExact**Why use STATA?**• Quick start, user friendly • Immediate results, response • You can look at the data • Menu-driven option • Good graphics • Log and do files • Good manuals, help menu**Why NOT use STATA?**• SAS is used more often? • SAS does some things STATA does not • Programming easier with S-plus and R? • R is free • Complicated data structure and manipulation easier with SAS? • Epi-info (free) is even easier than STATA?**STATA – Basic functionality**• Holds data for you • Stata holds 1 “flat” file dataset only (.dta file) • Listens to what you want • Type a command, press enter • Does stuff • Statistics, data manipulation, etc • Shows you the results • Results window**Demo #1**• Open the program • Entering vs. loading data • Look at data • Run a command • Orient to windows and buttons**Two basic windows**Command Results Optional windows Variable list History of commands Other functions Data browser/editor Do file editor Viewer (for log, help files, etc) STATA - Windows**STATA - Buttons**• The usual – open, save, print • Log-file open/suspend/close • Do-file editor • Browse and Edit • Break**STATA - Menus**• Almost every command can be accessed via menu**Menu advantages**Look for commands you don’t know about See the options for each command Complex commands easier – learn syntax Command line advantages Faster (if you know the command!) “Closer” to the program Only way to write “do” files Document and repeat analyses Menu vs. Command line**Demo #2**• Load a STATA dataset • Explore the data • Describe the data • Answer some simple research questions • Gender, BMI, blood pressure**STATA commandsDescribing your data**• describe [varlist] • Displays variable names, types, labels • list [varlist] • Displays the values of all observations • codebook [varlist] • Displays labels and codes for all variables**STATA commandsDescriptive statistics – continuous data**• summarize [varlist] [, detail] • # obs, mean, SD, range • “, detail” gets you more detail (median, etc) • ci [varlist] • Mean, standard error of mean, and confidence intervals • Actually works for dichotomous variables, too.**STATA commandsGraphical exploration – continuous data**• histogramvarname • Simple histogram of your variable • graph box varlist • Box plot of your variable • qnorm varname • Quantile plot of your variable to check normality**STATA commandsDescriptive statistics – categorical data**• tabulate [varname] • Counts and percentages • (see also, table - this is very different!)**STATA commandsAnalytic statistics – 2 categorical**variables • tabulate [var1] [var2] • “Cross-tab” • Descriptive options , row (row percentages) , col (column percentages) • Statistics options , chi2 (chi2 test) , exact (fisher’s exact test)**Getting help**• Try to find the command on the pull-down menus • Help menu • If you don’t know the command - Search... • If you know the command - Stata command... • Try the manuals • more detail, theoretical underpinnings, etc**STATA commandsAnalytic statistics – 1 categorical, 1**continuous**STATA commandsAnalytic statistics – 1 categorical, 1**continuous • bysortcatvar: summarize [contvar] • mean, SD, range of one in subgroup • ttest [contvar], by(catvar) • t-test • oneway [contvar] [catvar] • ANOVA • table [catvar] [, contents(mean [contvar]…) • Table of statistics**STATA commandsAnalytic statistics – 2 continuous**• scatter [var1] [var2] • Scatterplot of the two variables • pwcorr [varlist] [, sig] • Pairwise correlations between variables • “sig” option gives p-values • spearman [varlist] [, stats(rho p)]**In Lab Today…**• Expect some chaos! • IT will be here to help with wireless, logins, etc • Familiarize yourself with Stata • Load a dataset • Use Stata commands to analyze data and fill in the blanks**Next week**• Do files, log files, and workflow in Stata • Find a dataset!**Website addresses**• Course website • http://www.epibiostat.ucsf.edu/courses/schedule/biostat212.html • Computing information • http://www.epibiostat.ucsf.edu/courses/ChinaBasinLocation.html#computing • Download RDP for Macs (for Stata Server) • http://www.microsoft.com/mac/otherproducts/otherproducts.aspx?pid=remotedesktopclient • Citrix Web Server • http://apps.epi-ucsf.org/