% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/db.R
\name{emr_db.connect}
\alias{emr_db.connect}
\alias{emr_db.init_examples}
\alias{emr_db.init}
\alias{emr_db.ls}
\title{Initializes connection with Naryn Database}
\usage{
emr_db.connect(db_dirs = NULL, load_on_demand = NULL, do_reload = FALSE)

emr_db.init(
  global.dir = NULL,
  user.dir = NULL,
  global.load.on.demand = TRUE,
  user.load.on.demand = TRUE,
  do.reload = FALSE
)

emr_db.ls()
}
\arguments{
\item{db_dirs}{vector of db directories}

\item{load_on_demand}{vector of booleans, same length as db_dirs, if load_on_demand[i] is FALSE, tracks from db_dirs[i] will be pre-loaded, or a single 'TRUE' or 'FALSE' to set \code{load_on_demand} for all the databases. If NULL is passed, \code{load_on_demand} is set to TRUE on all the databases}

\item{do_reload}{If \code{TRUE}, rebuilds DB index files.}

\item{global.dir, user.dir, global.load.on.demand, user.load.on.demand, do.reload}{old parameters of the deprecated function \code{emr_db.init}}
}
\value{
None.
}
\description{
Initializes connection with Naryn Database
}
\details{
Call `emr_db.connect` function to establish the access to the tracks in the db_dirs.
To establish a connection using `emr_db.connect`, Naryn requires to specify at-least
one db dir. Optionally, `emr_db.connect` accepts additional db dirs which can also
contain additional tracks.

In a case where 2 or more db dirs contain the same track name (namespace collision),
the  track will  be taken from the db dir which was passed *last* in  the order of
connections.

For example, if we have 2 db dirs \code{/db1} and \code{/db2} which both contain
a track named \code{track1}, the call  \code{emr_db.connect(c('/db1', '/db2'))} will result with
Naryn  using \code{track1} from \code{/db2}. As you might expect the overriding is consistent not
only for the track's data, but also for any other Naryn entity using or pointing
to the track.

Even though all the db dirs may contain track files, their designation is different.
All the db dirs except the last dir in the order of connections are mainly read-only.
The directory which was connected last in the order, also known as *user dir*, is
intended to store volatile data like the results of intermediate calculations.

New tracks can be created only in  the db dir which was last in  the order of
connections, using \code{emr_track.import} or \code{emr_track.create}. In order to write tracks
to a db dir which is not last in the connection order, the user must explicitly
reconnect and set the required db dir as the last in order, this should be done for a
well justified reason.

When the package is attached it internally calls 'emr_db.init_examples'
which sets a single example db dir - 'PKGDIR/naryndb/test'.
('PKGDIR' is the directory where the package is installed).

Physical files in the database are supposed to be managed exclusively by
Naryn itself. Manual modification, addition or deletion of track files may
be done, yet it must be ratified via running 'emr_db.reload'. Some of these
manual changes however (like moving a track from global space to user or
vice versa) might cause 'emr_db.connect' to fail. 'emr_db.reload' cannot be
invoked then as it requires first the connection to the DB be established.
To break the deadlock use 'do_reload=True' parameter within 'emr_db.connect'.
This will connect to the DB and rebuild the DB index files in one step.

If 'load_on_demand' is 'TRUE' a track is loaded into memory only when it is
accessed and it is unloaded from memory as R sessions ends or the package is
unloaded.

If 'load_on_demand' parameter is 'FALSE', all the tracks from the specified
space (global / user) are pre-loaded into memory making subsequent track
access significantly faster. As loaded tracks reside in shared memory, other
R sessions running on the same machine, may also enjoy significant run-time
boost. On the flip side, pre-loading all the tracks prolongs the execution
of 'emr_db.connect' and requires enough memory to accommodate all the data.

Choosing between the two modes depends on the specific needs. While
'load_on_demand=TRUE' seems to be a solid default choice, in an environment
where there are frequent short-living R sessions, each accessing a track one
might opt for running a "daemon" - an additional permanent R session. The
daemon would pre-load all the tracks in advance and stay alive thus boosting
the run-time of the later emerging sessions.

Upon completion the connection is established with the database and a few
variables are added to the .naryn environment. These variables should not be
modified by the user!

\tabular{lll}{
.naryn$EMR_GROOT \tab First db dir of tracks in the order of connections \cr
.naryn$EMR_UROOT \tab Last db dir of tracks in the order of connection (user dir) \cr
.naryn$EMR_ROOTS \tab Vector of directories (db_dirs) \cr
}

\code{emr_db.init} is the old version of this function which
is now deprecated.

\code{emr_db.ls} lists all the currently connected databases.
}
\seealso{
\code{\link{emr_db.reload}}, \code{\link{emr_track.import}},
\code{\link{emr_track.create}}, \code{\link{emr_track.rm}},
\code{\link{emr_track.ls}}, \code{\link{emr_vtrack.ls}},
\code{\link{emr_filter.ls}}
}
\keyword{~data}
\keyword{~database}
\keyword{~db}
