~~CLOSETOC~~
<html><font color=#990000 size="+2"><b>Data Engine Configuration for Text and Language Features</b></font></html>

The data engine provides several runtime settings that allow the underlying [[wiki> Lucene Engine]] capabilities and limits to be configured using Data Store configuration commands.  To get a list of the commands users can:

<sxh DSQL; gutter: false;>
list dataspace commands
</sxh>

and 

<code>
set ..
</code>

<WRAP round tip>
Note that most of the settings are global and reqquire an engine re-start after setting.
</WRAP>

Windows users may check the variable by going to ''System>>Control Panel>>System>>Advanced system settings'' and clicking the  ''Environment Variables'' button or by opening a Windows Shell (DOS) window and setting the variable in the following way:



===text.engine.max_buffered_docs===

Determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment. Large values generally give faster indexing.
When this is set, the writer will flush every <code>max_buffered_docs</code> added documents. 

===text.engine.ram_buffered_size===

Determines the amount of RAM that may be used for buffering added documents and deletions before they are flushed to the disk. 
Generally for faster indexing performance it's best to flush by RAM usage instead of document count and use as large a RAM buffer as you can.
Default value is 16Mb.


===text.engine.ram_per_thread_hard_limit===

Sets the maximum memory consumption per thread triggering a forced flush if exceeded. 
The given value must be less that 2GB (2048MB) due to its internal 32 bit signed integer based memory addressing.
Default value is 1945Mb


===text.engine.reader_pooling===

This option lets you enable/disable the index reader pooling.
Default value is true.


===text.engine.use_compound_file===

Sets if the index writer should pack newly written segments in a compound file. Default is true.
Use false for batch indexing with very large ram buffer settings.


===setMaxBufferedDocs(int maxBufferedDocs)===

Description copied from class: LiveIndexWriterConfig
Determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment. Large values generally give faster indexing.
When this is set, the writer will flush every maxBufferedDocs added documents. Pass in DISABLE_AUTO_FLUSH to prevent triggering a flush due to number of buffered documents. 
Note that if flushing by RAM usage is also enabled, then the flush will be triggered by whichever comes first.
Disabled by default (writer flushes by RAM usage).


setRAMBufferSizeMB(double ramBufferSizeMB)

Description copied from class: LiveIndexWriterConfig
Determines the amount of RAM that may be used for buffering added documents and deletions before they are flushed to the Directory. Generally for faster indexing performance it's best to flush by RAM usage instead of document count and use as large a RAM buffer as you can.
When this is set, the writer will flush whenever buffered documents and deletions use this much RAM. Pass in DISABLE_AUTO_FLUSH to prevent triggering a flush due to RAM usage. Note that if flushing by document count is also enabled, then the flush will be triggered by whichever comes first.
The maximum RAM limit is inherently determined by the JVMs available memory. Yet, an IndexWriter session can consume a significantly larger amount of memory than the given RAM limit since this limit is just an indicator when to flush memory resident documents to the Directory. Flushes are likely happen concurrently while other threads adding documents to the writer. For application stability the available memory in the JVM should be significantly larger than the RAM buffer used for indexing.
NOTE: the account of RAM usage for pending deletions is only approximate. Specifically, if you delete by Query, Lucene currently has no way to measure the RAM usage of individual Queries so the accounting will under-estimate and you should compensate by either calling commit() or refresh() periodically yourself.
NOTE: It's not guaranteed that all memory resident documents are flushed once this limit is exceeded. Depending on the configured FlushPolicy only a subset of the buffered documents are flushed and therefore only parts of the RAM buffer is released.
The default value is DEFAULT_RAM_BUFFER_SIZE_MB.
Takes effect immediately, but only the next time a document is added, updated or deleted.


setRAMPerThreadHardLimitMB(int perThreadHardLimitMB)

Expert: Sets the maximum memory consumption per thread triggering a forced flush if exceeded. A DocumentsWriterPerThread is forcefully flushed once it exceeds this limit even if the getRAMBufferSizeMB() has not been exceeded. 
This is a safety limit to prevent a DocumentsWriterPerThread from address space exhaustion due to its internal 32 bit signed integer based memory addressing. The given value must be less that 2GB (2048MB)
See Also:
DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB


setReaderPooling(boolean readerPooling)

By default, IndexWriter does not pool the SegmentReaders it must open for deletions and merging, unless a near-real-time reader has been obtained by calling DirectoryReader.open(IndexWriter). 
This method lets you enable pooling without getting a near-real-time reader. NOTE: if you set this to false, IndexWriter will still pool readers once DirectoryReader.open(IndexWriter) is called.
Only takes effect when IndexWriter is first created.


setUseCompoundFile(boolean useCompoundFile)

Description copied from class: LiveIndexWriterConfig
Sets if the IndexWriter should pack newly written segments in a compound file. Default is true.
Use false for batch indexing with very large ram buffer settings.
Note: To control compound file usage during segment merges see MergePolicy.setNoCFSRatio(double) and MergePolicy.setMaxCFSSegmentSizeMB(double). This setting only applies to newly created segments.
