This document provides a description of anything noteworthy in the workings of the system. See also the system's master QA document.
The data that it is assumed will be in the database, in addition to data in the System tables and settings table, is defined here.
This data can be modified and new records can be added, but doing so will have implications for site layout.
Type | Pattern | Rewrite | Alias |
---|---|---|---|
Read | ^(/0/) | /g/$1 | none |
Research | ^(/2/) | /g/$1 | none |
Study | ^(/1/) | /g/$1 | none |
Read/ Research/ Study Sections | ^(/[012]/[0-9]/) | /s/$1 | none |
Read/ Research/ Study Authors | ^(/[012]/[0-9]+/[0-9]+/) | /a/$1 | none |
Read/ Research/ Study Books | ^(/[012]/[0-9]+/[0-9]+/[0-9+]/) | /a/$1 | none |
Read/ Research chapter pages | ^(/[012]/[0-9]+/[0-9]+/[0-9+]/[0-9]+/[0-9]+.html) | /a/$1 | none |
Study chapter pages | ^(/1/[0-9]+/[0-9]+/[0-9+]/[0-9]+/[0-9]+.html) | /Login?continuationUrl=$1 | /b/Login?continuationUrl=$1 |
Search | ^(/4/([0-9-]+/)+) | /Search/$1 | /b/Search$1 |
Shop chapter pages | ^(/3/([0-9-]+/)([0-9-]+/)([0-9-]+/)([0-9-]+/)+) | /Shop/$1 | /b/Shop$1 |
Shop books authors | ^(/3/([0-9-]+/)+) | /Shop/$1 | /b/Shop$1 |
Trolley | ^trolley(/method) | /trolley/bibliomania/$1 | /b/Trolley/bibliomania/$1 |
Boards | ^/board/([0-9]+/method) | /board/$1 | /b/Board/$1 |
Messages | ^/webmacro/MessagePage?db=paneris&id=([0-9]+/method) | /message$1 | /b/Message$1 |
Old URLs | ^/(Fiction etc) | /OldUrlRedirect$1 | /b/OldUrlRedirect$1 |
Access control is achieved using a cookie authentication scheme. In order to be able to read the protected content you have to know a certain random number (like 27835628), which is sent in by your browser with every request; you can only acquire the number by
At present the number isn't changed very often (in fact, only when the server is restarted), so the third option is theoretically feasible. We could make it harder by cancelling the number after a decent interval.
The second option is something out of the content-protection system's control. We need to be sure that the server is reasonably secure; if it isn't, no web-level access control is going to help.
The fourth access route is something that we just have to live with unless we go for a secure server. With credit card numbers obviously you have to do it. For our application it's overkill.
What you _can't_ do is look at the site in a vacuum and figure out how to access the protected content. And, it's much much easier just to register normally than to hack into the server, copy the magic number or snoop it, so in practice noone will bother with the latter.
A complex application inevitably has to make trade-offs under constraints as it reaches edges of the performance of its sub-components. These are listed here explicitly.
Message 37898 |
---|
> > Should we allow unlimited results? What if someone searches the whole site > and puts in 'the' . IS there potential to clog up the system? There is a silent, hard limit of 50 chapters per search for essentially that reason. Recall that we return the chapters hit in "score" order (basically it likes more word/phrase occurrences rather than fewer per chapter, and it likes them to be clustered together). That means that we must in principle look at _every_ valid "hit", even if the user only wants to see the "first" (most relevant) one. As a simple and guaranteed effective way of avoiding overload when all the search terms entered are very common, we just stop scoring after we've found 50 chapters. If people see "at least 50 hits" and not the one they want, they should know to start putting more discriminating keywords in rather than laboriously paging through to the end. The 10 and 5 shown on the search page itself are quite different numbers. They simply control how many of the occurrence contexts within each chapter are displayed. This actually makes little difference to the load on the server. |
Message 40180 |
---|
tim@hoop.co.uk writes: > so we should do something in Author.delete(), so that we reindex before > deleting? > > otherwise, we are going to have to periodically reindex? Theoretically the current scheme does mean that when authors---in fact any Chapters at all---are deleted, you get "orphaned" search hits in the fti database which don't correspond to anything in the Postgres/POEM database. In practice, this effect is be irrelevant since there are so few deletions, the phantom hits are silently ignored (or they are now you have fixed the Author case), and the unindex/reindex cycle happens anyway when the textids in question get reused. (That is to say, when a new text with the same author id number, book-of-author sequence number, and chapter-of-book number gets imported.) |