How to design a complex REST API considering DB performance?

Question

I've been following some tutorials on how to design REST APIs, but I still have some big questions marks. All these tutorials show resources with relatively simple hierarchies, and I would like to know how the principles used in those apply to a more complex one. Furthermore, they stay at a very high/architectural level. They barely show any relevant code, let alone the persistence layer. I'm specially concerned about database load/performance, as Gavin King said:

you will save yourself effort if you pay attention to the database at all stages of development

Let's say my application will provide training for Companies. Companies have Departments and Offices. Departments have Employees. Employees have Skills and Courses, and certain Level of certain skills are required to be able to sign for some courses. The hierarchy is as as follows, but with :

-Companies -Departments -Employees -PersonalInformation -Address -Skills (quasi-static data) -Levels (quasi-static data) -Courses -Address -Offices -Address

Paths would be something as:

companies/1/departments/1/employees/1/courses/1 companies/1/offices/1/employees/1/courses/1

Fetching a resource

So ok, when returning a company, I obviously don't return the whole hierarchy companies/1/departments/1/employees/1/courses/1 + companies/1/offices/../. I might return a list of links to the departments or the expanded departments, and have to take the same decission at this level: do I return a list of links to the department's employees or the expanded employees? That will depend on the number of departments, employees, etc.

Question 1: Is my thinking correct, is "where to cut the hierarchy" a typical engineering decission I need to make?

Now let's say that when asked GET companies/id, I decide to return a list of links to the department collection, and the expanded office information. My companies don't have many offices, so joining with the tables Offices and Addresses shouldn't be a big deal. Example of response:

GET /companies/1 200 OK { "_links":{ "self" : { "href":"http://trainingprovider.com:8080/companies/1" }, "offices": [ { "href": "http://trainingprovider.com:8080/companies/1/offices/1"}, { "href": "http://trainingprovider.com:8080/companies/1/offices/2"}, { "href": "http://trainingprovider.com:8080/companies/1/offices/3"} ], "departments": [ { "href": "http://trainingprovider.com:8080/companies/1/departments/1"}, { "href": "http://trainingprovider.com:8080/companies/1/departments/2"}, { "href": "http://trainingprovider.com:8080/companies/1/departments/3"} ] } "name":"Acme", "industry":"Manufacturing", "description":"Some text here", "offices": { "_meta":{ "href":"http://trainingprovider.com:8080/companies/1/offices" // expanded offices information here } } }

At the code level, this implies that (using Hibernate, I'm not sure how it is with other providers, but I guess that's pretty much the same) I won't put a collection of Department as a field in my Company class, because:

As said, I'm not loading it with Company, so I don't want to load it eagerly
And if I don't load it eagerly, I might as well remove it, because the persistence context will close after I load a Company and there is no point in trying to load it afterwards (LazyInitializationException).

Then, I'll put a Integer companyId in the Department class, so that I can add a department to a company.

Also, I need to get the ids of all the departments. Another hit to the DB but not a heavy one, so should be ok. The code could look like:

@Service @Path("/companies") public class CompanyResource { @Autowired private CompanyService companyService; @Autowired private CompanyParser companyParser; @Path("/{id}") @GET @Consumes(MediaType.APPLICATION_JSON) @Produces(MediaType.APPLICATION_JSON) public Response findById(@PathParam("id") Integer id) { Optional<Company> company = companyService.findById(id); if (!company.isPresent()) { throw new CompanyNotFoundException(); } CompanyResponse companyResponse = companyParser.parse(company.get()); // Creates a DTO with a similar structure to Company, and recursivelly builds // sub-resource DTOs such as OfficeDTO Set<Integer> departmentIds = companyService.getDepartmentIds(id); // "SELECT id FROM departments WHERE companyId = id" // add list of links to the response return Response.ok(companyResponse).build(); } }

@Entity @Table(name = "companies") public class Company { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Integer id; private String name; private String industry; @OneToMany(fetch = EAGER, cascade = {ALL}, orphanRemoval = true) @JoinColumn(name = "companyId_fk", referencedColumnName = "id", nullable = false) private Set<Office> offices = new HashSet<>(); // getters and setters }

@Entity @Table(name = "departments") public class Department { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Integer id; private String name; private Integer companyId; @OneToMany(fetch = EAGER, cascade = {ALL}, orphanRemoval = true) @JoinColumn(name = "departmentId", referencedColumnName = "id", nullable = false) private Set<Employee> employees = new HashSet<>(); // getters and setters }

Updating a resource

For the update operation, I can expose an endpoint with PUT or POST. Since I want my PUT to be idempotent, I can't allow partial updates. But then, if I want to modify the company's description field, I need to send the whole resource representation. That seems too bloated. The same when updating an employee's PersonalInformation. I don't think it makes sense having to send all the Skills + Courses together with that.

Question 2: Is PUT just used for fine-grained resources?

I've seen in the logs that, when merging an entity, Hibernate executes a bunch of SELECT queries. I guess that's just to check if anything has changed and update whatever information needed. The upper the entity in the hierarchy, the heavier and more complex the queries. But some sources advise to use coarse grained resources. So again, I'll need to check how many tables are too much, and find a compromise between resource granularity and DB query complexity.

Question 3: Is this just another "know where to cut" engineering decission or am I missing something?

Question 4: Is this, or if not, what is the right "thinking process" when designing a REST service and searching for a compromise between resource granularity, query complexity and network chattiness?

1. Yes; because REST calls are expensive, it's important to try and get the granularity right. — Robert Harvey, CommentedJan 19, 2016 at 16:21
2. No. The PUT verb has nothing to do with granularity, per se. — Robert Harvey, CommentedJan 19, 2016 at 16:22
4. The right thinking is "do what best meets your requirements for scalability, performance, maintainability and other issues." This might require some experimentation to find the sweet spot. — Robert Harvey, CommentedJan 19, 2016 at 16:22
Too long. Didn't read. Can this be spit into 4 actual questions? — MetaFight, CommentedJan 19, 2016 at 16:33

Borys Serebrov · Accepted Answer · 2016-01-26 17:52:25Z

I think you have complexity because you are starting with over-complication:

Paths would be something as:
companies/1/departments/1/employees/1/courses/1
companies/1/offices/1/employees/1/courses/1

Instead I would introduce simpler URL scheme like this:

GET companies/ Returns a list of companies, for each company return short essential info (ID, name, maybe industry)

GET companies/1 Returns single company info like this: { "name":"Acme", "description":"Some text here" "industry":"Manufacturing" departments: { "href":"/companies/1/departments" "count": 5 } offices: { "href":"/companies/1/offices" "count": 3 } } We don't expand the data for internal sub-resources, just return the count, so client knows that some data is present. In some cases count may be not needed too.

GET companies/1/departments Returns company departments, again short info for each department

GET departments/ Here you need to decide if it makes sense to expose a list of departments or not. If not - leave only companies/X/departments method. Note, that you can also use query string to make this method "searchable", like: /departments?company=1 - list of all departments for company 1 /departments?type=support - all 'support' departments for all companies

GET departments/1 Returns department 1 data

This way it answers most of your questions - you "cut" the hierarchy right away and you don't bind your URL scheme to the internal data structure. For example, if we know employee ID, would you expect to query it like employees/:ID or like companies/:X/departments/:Y/employees/:ID?

Regarding PUT vs POST requests, from your question it is clear that you feel the partial updates will be more efficient for your data. So I would just use POSTs.

In practice, you actually want to cache data reads (GET requests) and it is less important for data updates. And update are often can't be cached regardless of what type of request you do (like if server automatically sets the update time - it will be different for every request).

Update: regarding the right "thinking process" - since it is based on HTTP, we can apply the regular way of thinking when designing the web-site structure. In this case on the top we can have a list of companies and show a short description for each with a link to the "view company" page, where we show company details and links to offices / departments and so on.

dagnelies · Accepted Answer · 2016-01-22 20:42:23Z

IMHO, I think you're missing the point.

First, the REST API and DB performance are unrelated.

The REST API is just an interface, it does not define at all how you do stuff under the hood. You can map it to any DB structure you like behind it. Therefore:

design you API so that it's easy for the user
design your database so that's it can scale reasonably:
- ensure that you have the right indexes
- if you store objects, just ensure they're not too huge.

That's it.

...and lastly, this smells like premature optimization. Keep it simple, try it out, and adapt it if need be.

Community · Accepted Answer · 2021-10-07 07:34:52Z

Question 1: Is my thinking correct, is "where to cut the hierarchy" a typical engineering decission I need to make?

Maybe - I'd be worried that you are going about it backwards, though.

So ok, when returning a company, I obviously don't return the whole hierarchy

I don't think that's obvious at all. You should be returning representation(s) of company appropriate for the use cases you are supporting. Why wouldn't you? Does it really make sense that the api depends on the persistence component? Isn't part of the point that the clients don't need to be exposed to that choice in the implementation? Are you going to preserve a compromised api when you swap out one persistence component for another?

That said, if your use cases don't need the whole hierarchy, there's no need to return it. In an ideal world, the api would produce representations of company that are perfectly suited to the immediate needs of the client.

Question 2: Is PUT just used for fine-grained resources?

Pretty much - communicating the idempotent nature of a change by implementing as a put is nice, but the HTTP specification allows agents to make assumptions about what is really happening.

Note this comment from RFC 7231

A PUT request applied to the target resource can have side effects on other resources.

In other words, you can PUT a message (a "fine-grained resource") that describes a side effect to be executed on your primary resource (entity). You need to take some care to ensure your implementation is idempotent.

Question 3: Is this just another "know where to cut" engineering decission or am I missing something?

Maybe. It might be trying to tell you that your entities are not scoped correctly.

Question 4: Is this, or if not, what is the right "thinking process" when designing a REST service and searching for a compromise between resource granularity, query complexity and network chattiness?

This doesn't feel right to me, insofar as it seems you are trying to tightly couple your resource scheme to your entities, and are letting your choice of persistence drive your design, rather than the other way around.

HTTP is fundamentally a document application; if the entities in your domain are documents, then great - but the entities aren't documents, then you need to think. See Jim Webber's talk: REST in Practice, particularly starting from 36m40s.

That's your "fine grained" resource approach.

In your answer to question 1, why do you say I might be going backwards? — user3748908, CommentedJan 20, 2016 at 15:05
Because it sounded like you were trying to fit the requirements to the persistence layer constraint, instead of the other way around. — VoiceOfUnreason, CommentedJan 20, 2016 at 15:58

Aj_76 · Accepted Answer · 2016-01-23 00:24:22Z

In general, you don't want any implementation details exposed in the API. msw and VoiceofUnreason's answers are both communicating that, so it's important to pick up on.

Keep in mind the principle of least astonishment, especially since you're worried about idempotence. Take a look at some of the comments in the article you posted (https://stormpath.com/blog/put-or-post/); there is a lot of disagreement there about how the article presents idempotence. The big idea that I would take away from the article is that "Identical put requests should cause identical results". I.e. If you PUT an update to a company's name, the company's name changes and nothing else changes for that company as a result of that PUT. The same request 5 minutes later should have the same effect.

An interesting question to think about (check out gtrevg's comment in the article): any PUT request, including a full update, will modify dateUpdated even if a client doesn't specify it. Wouldn't that make any PUT request violate idempotency?

So back to the API. General things to think about:

Implemenation details exposed in the API should be avoided
If the implementation changes, your API should still be intuitive and easy-to-use
Documentation is important
Try not to warp the API to get performance improvements

minor aside: idempotence is contextually bound. As an example, Logging and Audit processes can be triggered inside a PUT and these actions are non-idempotent. But these are internal implementation details and do not impact representations that are exposed through the service abstraction; therefore as far as the API is concerned, the PUT is idempotent. — K. Alan Bates, CommentedJan 26, 2016 at 16:18

itsraghz · Accepted Answer · 2016-01-19 18:55:08Z

For your Q1 on where to cut the engineering decisions, how about picking up the unique Id of an entity that would other way give you the required details on the backend? For example, "companies/1/department/1" will have an unique Identifier on its own (or we can have one to represent the same) to give you the hierarchy, you can use that.

For your Q3 on PUT with full bloated information, you can flag the fields which were updated and send that additional metadata information to the server for you to introspect and update those fields alone.

Stack Exchange Network

How to design a complex REST API considering DB performance?

Fetching a resource

Updating a resource

5 Answers 5

Hot Network Questions

How to design a complex REST API considering DB performance?

Fetching a resource

Updating a resource

5 Answers 5

Related

Hot Network Questions