![]() ![]() For example, we can try using the public_private variable as our color grouping variable. This will group the data, generate a separate regression line for each group, and then color the points and regression lines based on these groups. Now, let’s add a color ( col=) argument to the aesthetic. Our buildings’ year_built, then, does not help explain much–if any–of the variation across the buildings’ site_EUI values. This indicates that there doesn’t seem to be a consistent relationship between the year a building was built and its site EUI. You can now see that the line is quite flat and the points seem to scatter pretty randomly above and below the regression line. ![]() The plot above is “zoomed in” a bit, making it a little easier to see the regression line. We will want to make a note about these outliers elsewhere in the analysis, but restricting the Y axis limits for now will allow you to “zoom in” and see more details of the relationship between the buildings’ year built and their site EUI values: ggplot(data, aes(x=year_built, y=site_EUI)) + This will eliminate the handful of outliers that have very high site EUI values. Let’s now change something: let’s restrict the limits of the Y axis so it ranges from 0 to 300 kBtu/sq. # source_EUI weather_normalized_source_EUI water_use # year_built total_GHG_emissions site_EUI weather_normalized_site_EUI # 3 Hospital (General Medical & Surgical) 82807 0 # 1 Drinking Water Treatment & Distribution 650000 0 # prop_type floor_area floor_area_parking # 1 Public 4500 Marshall Street NE 55421 0 # public_private address zip_code energy_star_score # 1 City of Minneapolis Water Treatment and Distribution Campus But for now, let’s look at how to add a simple linear regression line to a ggplot graphic.īefore we get started, let’s again load both the dplyr and ggplot libraries, as well as the Minneapolis buildings energy benchmarking dataset: library(dplyr)ĭata % filter(site_EUI > 300) # org_name prop_name To address this concern, we’ll look at an alternative approach in a minute. It is important to note that sometimes this type of linear relationship can be a bit too simple to effectively sum up more complex relationships between variables. It helps you say things like: “If variable X goes up by one unit, then variable Y tends to also go up by Z number of units” or “If variable X goes up by one unit, then variable Y tends to go down by Z number of units”. A linear regression line is a very simple way to visualize the direction and magnitude of a relationship between two variables. Ggplot makes it easy to add linear regression lines to a plot. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |