~~Title: Generating Features~~
~~NOTOC~~

<html><font color=#990000 size="+2"><b>Cognitive Automation: Generating Features</b></font></html>

Features can be geneated in a variety of ways ..

A reactive data platform makes it easy for developers and analysts to implement new data processing techniques by making them part of the data processing language. Instead of relying on complex frameworks and specialized programming languages the capabilities are available as part of a general-purpose query engine.  This lets developers use familiar tools for building something new, allowing reactive programming techniques to be used for [[wp>microservice]] development, back-office automation, [[wp>Internet_of_things| IoT]] or [[wp>big data]] analytics applications.  

==== The Actor Model ====

<sxh rpl; gutter: false;>

explain feature set for text at '/Users/nkutuzov/Streamscape/mnodes/Sysplex1/TestNode12_nlp/datasets/bbc/politics/002.txt' order by tfidf desc top 20

SeqId  Id   Feature      TF  TFIDF
-----  ---  -----------  --  ------------------
0      1    information  9   2.537028199562189
1      14   mail         4   2.176573958740016
2      398  laws         3   1.9732029520822456
3      441  lib          2   1.8009487223008052
4      213  rules        3   1.7827882477235069
5      317  act          3   1.6902978398271522
6      418  civil        1   1.6010615019393333
7      557  staff        1   1.5236017711180112
8      146  authorities  1   1.518143619228522
9      566  lord         1   1.5127485687746867
10     473  details      1   1.5074151773903293
11     497  millions     1   1.4917712544389623
12     9    parliament   1   1.4766347924208238
13     516  alan         1   1.47169647473324
14     50   content      1   1.457187534100559
15     222  committee    1   1.4075433457732252
16     301  chancellor   1   1.394872884209115
17     388  force        1   1.394872884209115
18     94   brought      1   1.3825373642439756
19     57   tories       1   1.347374293531142


explain tokenize text at '/Users/nkutuzov/Streamscape/mnodes/Sysplex1/TestNode12_nlp/datasets/bbc/politics/002.txt' tokens

Token           Count  Part Of Speech
--------------  -----  --------------
information     9      NN
mail            6      NNS
freedom         5      NN
thomas          4      NNS
mail            4      NN
commissioner    3      NN
disclosure      3      NN
rule            3      NNS
act             3      NN
law             3      NNS
beith           2      NN
guidance        2      NN
probe           2      NNS
destruction     2      NN
destroy         2      VB
record          2      NNS
timing          2      NN
deletion        2      NN
cabinet         2      NN
subject         2      JJ
lib             2      NN
prevent         2      VB
retention       2      NN
public          2      JJ

</sxh>

<sxh rpl; gutter: false;>
	alter training step (Tokenizer) set algorithm properties (tokenizer.lemmatizer='true')
</sxh>


("CC", "Coordinating conjunction");
("CD", "Cardinal number");
("DT", "Determiner");
("EX", "Existential there");
("FW", "Foreign word");
("IN", "Preposition or subordinating conjunction");
("JJ", "Adjective");
("JJR", "Adjective, comparative");
("JJS", "Adjective, superlative");
("LS", "List item marker");
("MD", "Modal");
("NN", "Noun, singular or mass");
("NNS", "Noun, plural");
("NNP", "Proper noun, singular");
("NNPS", "Proper noun, plural");
tags.put("PDT", "Predeterminer");
tags.put("POS", "Possessive ending");
tags.put("PRP", "Personal pronoun");
tags.put("PRP$", "Possessive pronoun");
tags.put("RB", "Adverb");
tags.put("RBR", "Adverb, comparative");
tags.put("RBS", "Adverb, superlative");
tags.put("RP", "Particle");
tags.put("SYM", "Symbol");
tags.put("TO", "to");
tags.put("UH", "Interjection");
tags.put("VB", "Verb, base form");
tags.put("VBD", "Verb, past tense");
tags.put("VBG", "Verb, gerund or present participle");
tags.put("VBN", "Verb, past participle");
tags.put("VBP", "Verb, non-3rd person singular present");
tags.put("VBZ", "Verb, 3rd person singular present");
tags.put("WDT", "Wh-determiner");
tags.put("WP", "Wh-pronoun");
tags.put("WP$", "Possessive wh-pronoun");
tags.put("WRB", "Wh-adverb");

Text
 
Watchdog probes e-mail deletions
 
The information commissioner says he is urgently asking for details of Cabinet Office orders telling staff to delete e-mails more than three months old.
 
Richard Thomas "totally condemned" the deletion of e-mails to prevent their disclosure under freedom of information laws coming into force on 1 January. Government guidance said e-mails should only be deleted if they served "no current purpose", Mr Thomas said. The Tories and the Lib Dems have questioned the timing of the new rules.
 
Tory leader Michael Howard has written to Tony Blair demanding an explanation of the new rules on e-mail retention. On Monday Lib Dem constitutional affairs committee chairman Alan Beith warned that the deletion of millions of government e-mails could harm the ability of key probes like the Hutton Inquiry. The timing of the new rules just before the Freedom of Information Act comes into forces was "too unlikely to have been a coincidence", Mr Beith said. But a Cabinet Office spokeswoman said the move was not about the new laws or "the destruction of important records". Mr Beith urged the information commissioner to look at how the "e-mail regime" could "support the freedom of information regime".
 
Mr Thomas said: "The new Act of Parliament makes it very clear that to destroy records in order to prevent their disclosure becomes a criminal offence." He said there was already clear guidance on the retention of e-mails contained in a code of practice from the lord chancellor. All e-mails are subject to the freedom of information laws, but the important thing was the content of the e-mail, said Mr Thomas.
 
"If in doubt retain, that has been the long-standing principle of the civil service and public authorities. It's only when you've got no further use for the particular record that it may be legitimate to destroy it. "But any deliberate destruction to avoid the possibility of later disclosure is to be totally condemned." The Freedom of Information Act will cover England, Wales and Northern Ireland from next year. Similar measures are being brought in at the same time in Scotland. It provides the public with a right of access to information held by about 100,000 public bodies, subject to various exemptions. Its implementation will be monitored by the information commissioner.
 
<sxh rpl; gutter: false;>
 
explain feature set for text at '/Users/nkutuzov/Streamscape/mnodes/Sysplex1/TestNode12_nlp/datasets/bbc/politics/002.txt' order by tf
 
SeqId  Id   Feature      TF  TFIDF
-----  ---  -----------  --  ------------------
0      9    parliament   1   1.4766347924208238
1      50   content      1   1.457187534100559
2      57   tories       1   1.347374293531142
3      61   ve           1   1.1037497065632471
4      94   brought      1   1.3825373642439756
5      98   access       1   1.280844863358869
6      137  monday       1   1.1973960311122511
7      142  comes        1   1.214064469659888
8      146  authorities  1   1.518143619228522
9      159  move         1   0.9258326432357146
10     222  committee    1   1.4075433457732252
11     231  time         1   0.4624281298370358
12     282  tony         1   1.1606729724436073
13     285  tory         1   1.300847925427034
14     301  chancellor   1   1.394872884209115
15     310  howard       1   1.2940845220191695
16     388  force        1   1.394872884209115
17     405  blair        1   1.1734591603980609
18     418  civil        1   1.6010615019393333
19     423  key          1   1.1173429005939914
20     435  wales        1   1.300847925427034
21     448  record       1   0.9590107400214553
22     461  similar      1   1.307710794208892
23     473  details      1   1.5074151773903293
24     488  service      1   1.0372671218511866
25     497  millions     1   1.4917712544389623
26     503  coming       1   1.1632023372626323
27     516  alan         1   1.47169647473324
28     534  makes        1   1.307710794208892
29     557  staff        1   1.5236017711180112
30     562  warned       1   1.1708737428894405
31     565  look         1   1.0490324508795514
32     566  lord         1   1.5127485687746867
33     573  chairman     1   1.138505216583939
34     592  held         1   1.0840746615119172
35     607  scotland     1   1.3217469399862882
36     608  michael      1   1.0672560244864255
37     621  england      1   1.0295851925592865
38     653  ireland      1   1.304266744136668
39     663  months       1   0.8499904761367529
40     664  support      1   1.1127677005489016
41     679  january      1   1.0450776249504312
42     709  leader       1   1.1457780910583983
43     718  current      1   1.0905412776872496
44     184  government   2   0.8303516939488327
45     441  lib          2   1.8009487223008052
46     717  office       2   1.2438580196886269
47     213  rules        3   1.7827882477235069
48     315  public       3   1.1414263354103111
49     317  act          3   1.6902978398271522
50     398  laws         3   1.9732029520822456
51     14   mail         4   2.176573958740016
52     1    information  9   2.537028199562189
</sxh>
