I've been following some tutorials on how to design REST APIs, but I still have some big questions marks. All these tutorials show resources with relatively simple hierarchies, and I would like to know how the principles used in those apply to a more complex one. Furthermore, they stay at a very high/architectural level. They barely show any relevant code, let alone the persistence layer. I'm specially concerned about database load/performance, as Gavin King said:
you will save yourself effort if you pay attention to the database at all stages of development
Let's say my application will provide training for Companies
. Companies
have Departments
and Offices
. Departments
have Employees
. Employees
have Skills
and Courses
, and certain Level
of certain skills are required to be able to sign for some courses. The hierarchy is as as follows, but with :
-Companies -Departments -Employees -PersonalInformation -Address -Skills (quasi-static data) -Levels (quasi-static data) -Courses -Address -Offices -Address
Paths would be something as:
companies/1/departments/1/employees/1/courses/1 companies/1/offices/1/employees/1/courses/1
Fetching a resource
So ok, when returning a company, I obviously don't return the whole hierarchy companies/1/departments/1/employees/1/courses/1
+ companies/1/offices/../
. I might return a list of links to the departments or the expanded departments, and have to take the same decission at this level: do I return a list of links to the department's employees or the expanded employees? That will depend on the number of departments, employees, etc.
Question 1: Is my thinking correct, is "where to cut the hierarchy" a typical engineering decission I need to make?
Now let's say that when asked GET companies/id
, I decide to return a list of links to the department collection, and the expanded office information. My companies don't have many offices, so joining with the tables Offices
and Addresses
shouldn't be a big deal. Example of response:
GET /companies/1 200 OK { "_links":{ "self" : { "href":"http://trainingprovider.com:8080/companies/1" }, "offices": [ { "href": "http://trainingprovider.com:8080/companies/1/offices/1"}, { "href": "http://trainingprovider.com:8080/companies/1/offices/2"}, { "href": "http://trainingprovider.com:8080/companies/1/offices/3"} ], "departments": [ { "href": "http://trainingprovider.com:8080/companies/1/departments/1"}, { "href": "http://trainingprovider.com:8080/companies/1/departments/2"}, { "href": "http://trainingprovider.com:8080/companies/1/departments/3"} ] } "name":"Acme", "industry":"Manufacturing", "description":"Some text here", "offices": { "_meta":{ "href":"http://trainingprovider.com:8080/companies/1/offices" // expanded offices information here } } }
At the code level, this implies that (using Hibernate, I'm not sure how it is with other providers, but I guess that's pretty much the same) I won't put a collection of Department
as a field in my Company
class, because:
- As said, I'm not loading it with
Company
, so I don't want to load it eagerly - And if I don't load it eagerly, I might as well remove it, because the persistence context will close after I load a Company and there is no point in trying to load it afterwards (
LazyInitializationException
).
Then, I'll put a Integer companyId
in the Department
class, so that I can add a department to a company.
Also, I need to get the ids of all the departments. Another hit to the DB but not a heavy one, so should be ok. The code could look like:
@Service @Path("/companies") public class CompanyResource { @Autowired private CompanyService companyService; @Autowired private CompanyParser companyParser; @Path("/{id}") @GET @Consumes(MediaType.APPLICATION_JSON) @Produces(MediaType.APPLICATION_JSON) public Response findById(@PathParam("id") Integer id) { Optional<Company> company = companyService.findById(id); if (!company.isPresent()) { throw new CompanyNotFoundException(); } CompanyResponse companyResponse = companyParser.parse(company.get()); // Creates a DTO with a similar structure to Company, and recursivelly builds // sub-resource DTOs such as OfficeDTO Set<Integer> departmentIds = companyService.getDepartmentIds(id); // "SELECT id FROM departments WHERE companyId = id" // add list of links to the response return Response.ok(companyResponse).build(); } }
@Entity @Table(name = "companies") public class Company { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Integer id; private String name; private String industry; @OneToMany(fetch = EAGER, cascade = {ALL}, orphanRemoval = true) @JoinColumn(name = "companyId_fk", referencedColumnName = "id", nullable = false) private Set<Office> offices = new HashSet<>(); // getters and setters }
@Entity @Table(name = "departments") public class Department { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Integer id; private String name; private Integer companyId; @OneToMany(fetch = EAGER, cascade = {ALL}, orphanRemoval = true) @JoinColumn(name = "departmentId", referencedColumnName = "id", nullable = false) private Set<Employee> employees = new HashSet<>(); // getters and setters }
Updating a resource
For the update operation, I can expose an endpoint with PUT
or POST
. Since I want my PUT
to be idempotent, I can't allow partial updates. But then, if I want to modify the company's description field, I need to send the whole resource representation. That seems too bloated. The same when updating an employee's PersonalInformation
. I don't think it makes sense having to send all the Skills
+ Courses
together with that.
Question 2: Is PUT just used for fine-grained resources?
I've seen in the logs that, when merging an entity, Hibernate executes a bunch of SELECT
queries. I guess that's just to check if anything has changed and update whatever information needed. The upper the entity in the hierarchy, the heavier and more complex the queries. But some sources advise to use coarse grained resources. So again, I'll need to check how many tables are too much, and find a compromise between resource granularity and DB query complexity.
Question 3: Is this just another "know where to cut" engineering decission or am I missing something?
Question 4: Is this, or if not, what is the right "thinking process" when designing a REST service and searching for a compromise between resource granularity, query complexity and network chattiness?