[invalid] Uses file Lexer instead of selected Lexer

Report Bugs Here

[invalid] Uses file Lexer instead of selected Lexer

Postby immortalnights » Tue Dec 21, 2010 5:41 pm

I'm not sure if this is a bug again, so am writing here first before making any official Issue.

It appears that when using the CodeBrowser it will only ever use the information within the hard-coded Lexer rather then the one the user may have selected from the Settings -> Lexer options list.

Based on the fact that the Launcher plugin does appear to change it's options based upon the selected Lexer I assume it's wrong that CodeBrowser does not? Unless I'm just not seeing it happen or a reason for it not to do so.
immortalnights
User
 
Posts: 39
Joined: Tue Dec 21, 2010 2:02 pm

Re: [CodeBrowser] Uses file Lexer instead of selected Lexer

Postby cody » Tue Dec 21, 2010 6:27 pm

Hello,

I can't reproduce this issue.

I tried to do the following:

1) Opened a python file
2) Observed codebrowser showing the python code outline
3) Settings->SetLexer->C++
4) Observed that the codebrowser outline cleared since it couldn't parse the file as being c++ code

What version of Editra and CodeBrowser are you using? Also what Operating System are you using?

Do you have a concrete example that reproduces the issue?


Cody
User avatar
cody
Site Admin
 
Posts: 1315
Joined: Mon Oct 09, 2006 2:49 am
Location: United States

Re: [CodeBrowser] Uses file Lexer instead of selected Lexer

Postby immortalnights » Tue Dec 21, 2010 6:50 pm

I believe this to be an issue with the CPP CodeBrower being unable to read my header file at all.

Having done what you suggested (starting with Python script and changing to CPP) it did, as you said and resulted in no definitions or code to browse.

Since writing this I have read that the CPP support in CodeBrower is limited. It's something that would be very useful to me (and one of the features that is pulling my from SciTE :)) so I may take some time to work to improve this.

---

A final related question though - is there some reason why the CodeBrowser doesn't display methods within Classes in a CPP header file when the methods implementation is not also within the header file.

As in, within the following:
Code: Select all
class CppHeaderTest
{
    void TestMethodInline() const
    {
       return true;
    }

    void TestMethodDefine() const;
}

Only 'TestMethodInlie' is displayed in CodeBrowser, neither the 'Class' or 'TestMethodDefine' is. (This is the primary reason my header file didn't show any changes with the CPP Lexer)
immortalnights
User
 
Posts: 39
Joined: Tue Dec 21, 2010 2:02 pm

Re: [CodeBrowser] Uses file Lexer instead of selected Lexer

Postby cody » Tue Dec 21, 2010 7:04 pm

Hello,

There are a few things at play here

1) I think that by default .h files are associated with C so it doesn't know about classes.

2) As you found out the C/C++/Java codebrowser is very simplistic currently (uses regex), was just added as a better than nothing version with hopes that someone else could pick it up and improve upon it ;) hint hint

3) I have been intending to add some additional meta data to the file type identification that may be necessary to improve this situation. As currently the parser just sees that it has a C++ file and doesn't differentiate its parsing for the header file which would require different scope and naming checks than the cpp file.

i.e) in a header file
[code]
class Foo
{
public:
void Bar();
}

Scoping would be determined by the brackets

In the cpp file its by the binary scope resolution operator (i.e Foo::Bar)

This could probably be resolved by passing file name into the parser.

Cody
User avatar
cody
Site Admin
 
Posts: 1315
Joined: Mon Oct 09, 2006 2:49 am
Location: United States

Re: [CodeBrowser] Uses file Lexer instead of selected Lexer

Postby immortalnights » Wed Dec 22, 2010 4:30 pm

cody wrote:1) I think that by default .h files are associated with C so it doesn't know about classes.

This isn't the case, as specifying the file as ".hpp" (a CPP header) suffers the same lack of classes.

cody wrote:2) As you found out the C/C++/Java codebrowser is very simplistic currently (uses regex), was just added as a better than nothing version with hopes that someone else could pick it up and improve upon it ;) hint hint

Regex is the best way to do this, right? Processing each line as you go down the file and filling in the CodeBroswer as appropriate.

I'm assuming though that there isn't a way for it to update dynamically at the moment, as in, it would have to re-parse the full file when it changes? Would the CodeBrowser work being dynamically updated as the user makes changes within the file? Or would that be too much processing work for a plugin?

cody wrote:3) I have been intending to add some additional meta data to the file type identification that may be necessary to improve this situation. As currently the parser just sees that it has a C++ file and doesn't differentiate its parsing for the header file which would require different scope and naming checks than the cpp file.

Yes, a quick solution would be to send in the file name for the CodeBrowser, but deeper internal meta-data would be nicer still. Though I expect that would be a heaver change.

What about CodeBrowser caching, am I right in assuming that if the user views Buffer-1 and then moved to Buffer-2 and then back to Buffer-1, in each occasion the CodeBroser re-parses the file? Would some serialized cache be a suitable solution?

(Obviously, I don't intend you to rush of and make the above changes - once I'm a little more familiar with Editra and CodeBroswer I would be interested in helping out.)
immortalnights
User
 
Posts: 39
Joined: Tue Dec 21, 2010 2:02 pm

Re: [CodeBrowser] Uses file Lexer instead of selected Lexer

Postby cody » Wed Dec 22, 2010 5:08 pm

immortalnights wrote:
cody wrote:2) As you found out the C/C++/Java codebrowser is very simplistic currently (uses regex), was just added as a better than nothing version with hopes that someone else could pick it up and improve upon it ;) hint hint

Regex is the best way to do this, right? Processing each line as you go down the file and filling in the CodeBroswer as appropriate.


No, regex is just easier. It would be better write an appropriate line/token parser to get more complete and accurate results. It is very hard to determine scoping and context accurately with a simple regex search.

immortalnights wrote:I'm assuming though that there isn't a way for it to update dynamically at the moment, as in, it would have to re-parse the full file when it changes? Would the CodeBrowser work being dynamically updated as the user makes changes within the file? Or would that be too much processing work for a plugin?


It depends on how the parser is implemented if it wanted to cache data internally it could save it between requests and try to determine what has changed. However currently all the codebrowser parsers do a full reparse of the text buffer at the update events that are used to trigger a refresh (File open, tab change, lexer change, file save). This has not been much of an issue, the main performance bottle neck is actually when the parser thread completes and the UI builds the outline. I have an optimization ticket in the issue tracker to build the tree dynamically as nodes are expanded instead of doing it all up front. This would reduce a little bit of lag that is associated with populating the tree ctrl widget with all the nodes up front and storing all the data in the widget.

immortalnights wrote:
cody wrote:3) I have been intending to add some additional meta data to the file type identification that may be necessary to improve this situation. As currently the parser just sees that it has a C++ file and doesn't differentiate its parsing for the header file which would require different scope and naming checks than the cpp file.

Yes, a quick solution would be to send in the file name for the CodeBrowser, but deeper internal meta-data would be nicer still. Though I expect that would be a heaver change.


It probably wouldn't really be any more work actually since the parser interface would need to be updated to accept another parameter anyways. Would just be a matter of what would be good pass in. I have written the codebrowser's 'taglib' in a way that it is independent of wx as a dependency since it could be a useful library outside of Editra as well. It has much of the features of CTags and to my knowledge is better in some accounts to maintaining the scope of objects in the outline.

I would probably change it to accept some sort of class object in case more than the filename becomes necessary in the future.

immortalnights wrote:What about CodeBrowser caching, am I right in assuming that if the user views Buffer-1 and then moved to Buffer-2 and then back to Buffer-1, in each occasion the CodeBroser re-parses the file? Would some serialized cache be a suitable solution?

(Obviously, I don't intend you to rush of and make the above changes - once I'm a little more familiar with Editra and CodeBroswer I would be interested in helping out.)


It could be done, though in my experience so far it has not seemed necessary or worth it for the amount of extra effort that it would require. The text processing is quite fast and is done on a background thread so it does not have much of an affect on the UI. This is of course only true to a certain point, if it is running for very large files (several MB or larger) then it is possible that the text processing will slow down the UI thread but as Editra is intended for editing code files, which are almost all of the time rather small (< 1MB) it is not much of an issue.


Cody
User avatar
cody
Site Admin
 
Posts: 1315
Joined: Mon Oct 09, 2006 2:49 am
Location: United States

Re: [CodeBrowser] Uses file Lexer instead of selected Lexer

Postby immortalnights » Thu Dec 23, 2010 12:41 pm

cody wrote:This has not been much of an issue, the main performance bottle neck is actually when the parser thread completes and the UI builds the outline. I have an optimization ticket in the issue tracker to build the tree dynamically as nodes are expanded instead of doing it all up front. This would reduce a little bit of lag that is associated with populating the tree ctrl widget with all the nodes up front and storing all the data in the widget.


To clarify, you parse the buffer and populate the tree view as you go along, or parse the file and then as a final step populate the tree view. Assuming that it wouldn't take time to dynamically populate the branches as they were opened I agree this would be a worthy improvement. Though it would make it rather frustrating if at times there was a multi-second delay when opening a branch.

cody wrote:I would probably change it to accept some sort of class object in case more than the filename becomes necessary in the future.

Yes, I agree that would be the best solution.

cody wrote:It could be done, though in my experience so far it has not seemed necessary or worth it for the amount of extra effort that it would require. The text processing is quite fast and is done on a background thread so it does not have much of an affect on the UI. This is of course only true to a certain point, if it is running for very large files (several MB or larger) then it is possible that the text processing will slow down the UI thread but as Editra is intended for editing code files, which are almost all of the time rather small (< 1MB) it is not much of an issue.


I agree with the fact it does not appear to be all that slow anyway. However, one thing that does bug me is that if I am working on a number of files the branches are reset each time I change buffer - and worse still when I save changes to it. In terms of working with multiple classes and their definitions and implementations it becomes frustrating to have to constantly reopen the branches.

Maybe the solution would be to store the branch state, but that would only work if you could ensure that a branch would be 100% recognizable between parses so as not to simple open random branches. Therefore I expect a full buffer-by-buffer cache would suit better.


Am I correct in assuming that the CodeBroser doesn't attempt to parse a file unless it's recognized as a valid type - and a type that has definitions where ever those definitions may be?
immortalnights
User
 
Posts: 39
Joined: Tue Dec 21, 2010 2:02 pm

Re: [CodeBrowser] Uses file Lexer instead of selected Lexer

Postby cody » Thu Dec 23, 2010 2:52 pm

immortalnights wrote:To clarify, you parse the buffer and populate the tree view as you go along, or parse the file and then as a final step populate the tree view. Assuming that it wouldn't take time to dynamically populate the branches as they were opened I agree this would be a worthy improvement. Though it would make it rather frustrating if at times there was a multi-second delay when opening a branch.


When one of the update messages is received the codebrowser takes a snapshot of all the buffers text and then parses the whole thing on a background thread and generates a document structure DocStruct object and gives it back to the UI thread. The UI thread then builds all the nodes on the TreeCtrl. The optimization would be to only build the exposed top level nodes up front and then as they are expanded their children would be populated from the DocStruct. It would be unnoticeable from a users perspective since this data is all available in memory already and it would just get rid of the initial small lag that can be seen sometimes when the tree is first being created.

cody wrote:I would probably change it to accept some sort of class object in case more than the filename becomes necessary in the future.

Yes, I agree that would be the best solution.

immortalnights wrote:I agree with the fact it does not appear to be all that slow anyway. However, one thing that does bug me is that if I am working on a number of files the branches are reset each time I change buffer - and worse still when I save changes to it. In terms of working with multiple classes and their definitions and implementations it becomes frustrating to have to constantly reopen the branches.

Maybe the solution would be to store the branch state, but that would only work if you could ensure that a branch would be 100% recognizable between parses so as not to simple open random branches. Therefore I expect a full buffer-by-buffer cache would suit better.


Yea it would be a nice feature addition to try and save the expansion state between tab changes and to only do a an update to the exposed parts on a save. Would require caching some of the DocStruct information between changes could probably use the file close message as an indicator of when to delete stuff from the cache.

immortalnights wrote:Am I correct in assuming that the CodeBroser doesn't attempt to parse a file unless it's recognized as a valid type - and a type that has definitions where ever those definitions may be?


Yea if there isn't a parser then it doesn't try to do anything since it cant. There is a mapping of filetype ids to DocStruct tag generator modules in the tagload.py file in the codebrowser plugin.
User avatar
cody
Site Admin
 
Posts: 1315
Joined: Mon Oct 09, 2006 2:49 am
Location: United States


Return to Bug Reports

Who is online

Users browsing this forum: No registered users and 1 guest

cron