So, which method is preferable – to preinitialize the dict or to use your last update?
depends on how do you plan to use it.
if you’re sure that words that aren’t in the list shouldn’t be collected why adding them? go with the preinitialized dict
it is all about the performance
if there’s no much difference collect everything and then retrieve from the dict by the key, then of course you’ll have to keep else statement
…
just to be clear, if you go with the predifined dict you won’t have —- key in the dict after you process the file.
.
upd
this one, must be the fastest version so far. ~45ms
code
public class Line
{
public int count = 0;
public List<string> lines;
public List<int> indexes;
public Line()
{
count = 0;
lines = new List<string>(){};
indexes = new List<int>();
}
public Line( string line, int index )
{
count = 1;
lines = new List<string>(){ line };
indexes = new List<int>(){ index };
}
public void AddLine( string line, int index )
{
lines.Add( line );
indexes.Add( index );
count++;
}
}
public class TextProcessor
{
public Dictionary<string, Line> data;
public string[] keys;
public Line[] lines;
public void ProcessFile(string file , string[] words)
{
data = new Dictionary<string, Line>();
//foreach( var w in words) data[w] = new Line();
var spaces = new char[]{' ',' ','\r','\n'};
string[] lines = File.ReadAllLines(file);
if (lines != null)
{
int line_index = 0;
string word;
foreach(string line in lines)
{
line_index++;
// #1
//word = line.TrimStart(spaces).Split()[0].ToLower();
// #2
//word = line.TrimStart(spaces);
//int len = word.IndexOfAny( spaces );
//word = word.Substring( 0, len < 0 ? word.Length : len ).ToLower();
// #3
//var chars = line.ToCharArray();
//int f1 = Array.FindIndex(chars, x => !char.IsWhiteSpace(x));
//if ( f1 < 0 ) continue;
//int f2 = Array.FindIndex(chars, f1, x => char.IsWhiteSpace(x));
//word = line.Substring( f1, f2 - f1 ).ToLower();
// #4
int f1 = -1;
for ( int i = 0; i < line.Length; i++ )
{
if ( !char.IsWhiteSpace(line[i]) )
{
f1 = i;
break;
}
}
if ( f1 < 0 ) continue;
int f2 = -1;
for ( int i = f1; i < line.Length; i++ )
{
if ( char.IsWhiteSpace(line[i]) )
{
f2 = i;
break;
}
}
word = line.Substring( f1, f2 - f1 ).ToLower();
if (data.ContainsKey(word))
{
data[word].AddLine( line, line_index );
}
else
{
data[word] = new Line( line, line_index );
}
}
}
keys = new string[data.Keys.Count];
data.Keys.CopyTo(keys, 0);
this.lines = new Line[data.Values.Count];
data.Values.CopyTo(this.lines, 0);
}
}
Yep, the last one is the fastest.
When I make it to work with string, for some files there is an error:
– Runtime error: .NET runtime exception: Length cannot be less than zero.
Found why. If a line starts with:
/
or with:
----
there is an error.
The same code here works with no errors: https://dotnetfiddle.net/9KqZve
The times I have with different versions:
- pure maxscript: 0.68 sec
- first C# version with regEx: 0.33 sec.
- last C# version: 0.26 sec – including time to fill maxscript arrays with the data. Only C# code is 0.05 sec
well, I doubt that this test file that you provided is a good test case to ensure that the solution is bug-free.
change this line like that and it will work
int f2 = line.Length;
.
nah, still two times slower. ~100ms
cool, perhaps there’s nothing left to optimize
even the latest .Net 7 version shows 28ms which isn’t far away from 40+ms on my device
curious, what is it all for? some kind of obfuscator / source file analyzer?
To make navigating in scripts more user friendly.
You know that Ctrl+RMB click gives you a menu where you can see the controls, functions, events, etc. used in the current script and you can go whenever you want by clicking the desired item.
When I see whole screen(literally) full with items to click, and there are more items not shown… not an easy task to find what you need.
So, I created this:
For the currently opened script, the [ </> ] shows, buttons for all available controls + local and global vars + struct names + functions + lines where I have format
and print
. Clicking a button shows the data. Clicking a row in the list shows the line number of the selected text, double clicking a row selects the same line in the MaxScript editor and make it visible, so navigation is easy and fast.
The button with the bookmark icon shows all bookmarks for the current script. The navigation is the same – click a row in the list to go to desired bookmark.
The [ /* ] button shows all comments which starts with –. The same way of navigation.
With the filter box I can find what I need much faster.
And most importantly, with your help and the help of the Denis collecting the data is much faster than Ctrl+RMB click inside the MaxScritp Editor.
What I have in my ToDo list:
- collect variables inside structs
- find a faster way to populate the listview. It takes more than 1,5 sec to fill it with 4000+ items.
what a bore you are!
public static string GetFirstWord(string line)
{
int i = 0;
while (i < line.Length && char.IsWhiteSpace(line, i)) { i++; }
int k = i;
while (k < line.Length && !char.IsWhiteSpace(line, k)) { k++; }
return line.Substring(i, k-i);
}
btw… on my machine the difference for pure searching and using trim and split is 25 vs 40. So it’s not a big deal.
Serejah:
perhaps there’s nothing left to optimize
shouldn’t have said that
denisT:
public static string GetFirstWord(string line)
nice, sometimes it drops even below 40ms
looks really good. must be very convenient to use for anyone who is too lazy to switch to vscode or similar.
miauu:
- find a faster way to populate the listview. It takes more than 1,5 sec to fill it with 4000+ items.
listview supports virtualization, maybe that’s the way to go. (never used it myself)
miauu:
- collect variables inside structs
did you mean struct properties?
it shouldn’t be a big deal, find a struct definition, then find an opening (
and closing )
and tokenize everything what’s inside that range
code
just the idea
...
re_options_m = dotNet.combineEnums (dotNetClass "System.Text.RegularExpressions.RegexOptions").MultiLine (dotNetClass "System.Text.RegularExpressions.RegexOptions").IgnoreCase,
...
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- 2. Collect struct defs
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
(
local pattern_struct = "\bstruct\s(\w+)\b"
matches = (dotNetClass "System.Text.RegularExpressions.RegEx").Matches code pattern_struct re_options_m
for i = 0 to matches.count - 1 do
(
local item = matches.item[i].groups.item[1]
local match_start = 1 + item.index
local match_end = match_start + item.value.count
if valid[ match_start ] and valid[ match_end ] do
(
append STRUCT_TOKENS ( Token type:"struct" value:item.value start:match_start end:match_end )
)
)
Thank you.
I use vscode for Pyhton and Powershell, but I can’t force myself to use it(along with notepad++) for maxscript.
Serejah:
listview supports virtualization, maybe that’s the way to go. (never used it myself)
Thank you. Will check it in the coming days. First I have to “fix and arrange” the code of the whole script.
Serejah:
did you mean struct properties?
Yes. The functions, defined inside structs, are collected by the script, but the variables does not start with local
or global
, so they are not collected.
miauu:
… but the variables does not start with local or global, so they are not collected.
but they either go after the comma ,
or struct opening (
or before the assignment =
or the next comma or closing )
parenthesis. Looks not so complicated
Serejah:
var word = line.Trim().Split(null, 2)[0].ToLower();
TrimStart of course
var word = line.TrimStart().Split(null, 2)[0].ToLower();
Recent Topics
-
Need help with script to automate certain things
By siddhartha 6 days ago
-
By Dave 3 months ago
-
By MZ1 1 year ago
-
By brianstorm 1 year ago
-
By MZ1 1 year ago
-
Support class with multiple name (C#,SDK)
By MZ1 1 year ago
-
How to sort subarrays for Mat ID swapping
By brianstorm 1 year ago