English find length of string outside brackets

17 replies
Goto Page
To the start Previous 1 Next To the start
Up
Torque
User
Offline Off
I want to find the length of a string containing x characters outside the brackets.

Specific example: I have a string like 'abc(123)def ghij klm (456) opqrs' Now I want to cut that string at the point where there are 10 characters that are not between brackets. So in this case between i and j, position 15. How do I do this?

I came up with this

Code:
1
2
3
str = "abc(123)def ghij klm (456) opqrs"
maxchars = 10 
position = maxchars + (maxchars - str:sub(1,maxchars):gsub("%b()",""):len())


But that doesn't work when the brackets are not closed within the first 10 characters.

Thanks in advance for your help!
Relax, relate, release. Visit: www.lsdservers.boards.net
02.12.14 08:49:02 pm
Up
Vladimir Putin
User
Offline Off
You have to split up that long third line first. You gotta check if the maxchars is long enough.
So sub the string by maxchars, and if it has the pattern "...(...", you have to make it longer, until it reaches a ")".
Then just do the rest.

I don't use gsub often, so I have to figure it out as well.
edited 4×, last 02.12.14 09:20:12 pm
02.12.14 11:17:57 pm
Up
VADemon
User
Offline Off
I'd like to help but I didn't understand the explanation + this example. Give us a more realistic one
03.12.14 02:43:03 am
Up
Joni And Friends
User
Offline Off
This you mean? im not understand with your explaining actually
Code:
1
2
3
4
s="abc(123)def ghij klm (456) opqrs"
maxchar=10
char_of_s_out_maxchar=s:sub((maxchar+1),#s)
position_of_s_out_maxchar=(#s-maxchar)
Web | file File does not exist (15501) | file File does not exist (15463) | file cs2d [JAF] Adventure (19) | file File does not exist (15919)
03.12.14 07:47:35 am
Up
Alistaire
User
Offline Off
What kind of input do you want to handle and what kind of output do you want from it.
IMG:http://i.imgur.com/5zhwOTP.png
03.12.14 11:40:04 am
Up
Torque
User
Offline Off
Thanks for your suggestions guys
@user Alistaire:
Input: Any string of any length, any integer X.
Output: The substring of Input that contains X characters that are not within round brackets '()'.

So in the example string, if X=5 it should return "abc(123)de". This is the substring that has 5 characters outside the brackets. If X=10 it should return "abc(123)def ghi", this substring has 10 characters outside the brackets.

@user Joni And Friends:
Your code returns
Code:
1
s:len() - maxchars
Relax, relate, release. Visit: www.lsdservers.boards.net
03.12.14 12:24:52 pm
Up
Joni And Friends
User
Offline Off
like this?
Code:
1
2
3
f="abcd(123) efgh(456)"
maxchar=10
return f:sub(1,maxchar)

so it will return to "abcd(123) "
Web | file File does not exist (15501) | file File does not exist (15463) | file cs2d [JAF] Adventure (19) | file File does not exist (15919)
03.12.14 01:09:38 pm
Up
DC
Admin
Offline Off
From what I understood he wants something like string.sub but it should ignore all chars which are enclosed in "(" and ")" in the length (but still return them in the result).

This code should do:
Code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
function getSubstrWithoutBrackets(str, len)
     local l = str:len()
     local count = 0
     bracket = false
     local i
     for i = 1, l do
          sub = str:sub(i, i)
          if not bracket then
               if sub == "(" then
                    bracket = true
               else
                    count = count + 1
                    if count >= len then
                         return print(str:sub(1, i))
                    end
               end
          elseif sub == ")" then
               bracket = false
          end
     end
     return str
end

print(getSubstrWithoutBrackets("abc(123)def ghij klm (456) opqrs", 5))
print(getSubstrWithoutBrackets("abc(123)def ghij klm (456) opqrs", 18))
print(getSubstrWithoutBrackets("abc(123)def ghij klm (456) opqrs", 100))

Tested it here: http://www.lua.org/cgi-bin/demo

This is the plain and stupid programmatic approach. Simply iterate over a string and increase a counter unless the chars are within ( and ). Return the string to the current position when the counter reached the target size. Return the entire string if the end of the counter isn't reached while iterating.

It's probably also possible with fancy regex somehow and much shorter than code but I'm no regex pro.
edited 1×, last 03.12.14 01:20:43 pm
www.UnrealSoftware.de | www.CS2D.com | www.CarnageContest.com | Use the forum & avoid PMs!
03.12.14 04:20:35 pm
Up
Torque
User
Offline Off
@user DC: Thank you very much. This code works flawlessly
All 'fancy' regex stuff I tried so far and suggestions from other people resulted in infinite loops.
Relax, relate, release. Visit: www.lsdservers.boards.net
03.12.14 11:01:31 pm
Up
VADemon
User
Offline Off
@user Torque:
After failing on improvising (because of math! ), I made a draft on paper and wrote this all down into a little algorithm:
Because iterating through every letter is for DCs >


It's actually not hard. Every iteration it searches for brackets (.-) and if it finds one, it calculates the amount of real text behind it (excluding brackets) just to know when to stop. At the same time it counts how long the bracket-texts are. Finally this number is added like this: yourString:sub(len + bracketLength) which gives us the result.

Rename the function as you wish. Whereby
bracketL is the total length of brackets
textL is just a raw real text length
prev2, curr1, curr2 = 1 is the opening bracket, 2 the closing one.
04.12.14 09:59:29 am
Up
Torque
User
Offline Off
Thanks for your contribution VADemon. This problem might be harder than you think:
thestring = "ab(c(123)def(1) g(a=false)hijk4325$#25432lmf(325)opqus"
Catches your program in an infinite loop
Relax, relate, release. Visit: www.lsdservers.boards.net
04.12.14 10:26:47 am
Up
DC
Admin
Offline Off
My code doesn't handle encapsulated brackets correctly either because you didn't mention this as a requirement. It doesn't crash but it will return wrong values because it will continue to count as soon as one single ")" occurs after any number of "(". It's rather trivial to fix though: Replace the bracket boolean with a counter which starts with 0 and which counts the bracket depth. +1 for "(" and -1 for ")". Then only increase the regular counter when the bracket depth is 0. Otherwise just check for "(" and ")" and change the bracket depth accordingly.

@user VADemon: I think my code is more straightforward. I didn't have to make a draft to write it I also doubt that yours is more efficient (because find must iterate as well internally) but that probably doesn't matter at all in this case.
edited 1×, last 04.12.14 10:29:40 am
www.UnrealSoftware.de | www.CS2D.com | www.CarnageContest.com | Use the forum & avoid PMs!
04.12.14 10:28:55 am
Up
Joni And Friends
User
Offline Off
like this?
Code:
1
2
3
f="abcd(123) efgh(456)"
maxchar=10
return f:sub(1,maxchar)

so it will return to "abcd(123) "
Web | file File does not exist (15501) | file File does not exist (15463) | file cs2d [JAF] Adventure (19) | file File does not exist (15919)
04.12.14 10:31:15 am
Up
DC
Admin
Offline Off
@user Joni And Friends: You already posted the exactly same thing and he said that its not what he wants. So it makes no sense to post the same thing again.
rules §2.1 - Needless and/or doubled posts (spam) are forbidden, no "+1", "inb4" etc.
www.UnrealSoftware.de | www.CS2D.com | www.CarnageContest.com | Use the forum & avoid PMs!
04.12.14 11:33:09 am
Up
Torque
User
Offline Off
Thanks for your suggested improvement DC. It wasn't a requirement to handle encapsulated brackets correctly. I expect correctly closed brackets of depth 1 in the string. But it shouldn't crash when the userinput contains a typo

I think that is a kind of standard requirement of any code, that it can't become a closed loop in any circumstance.
Relax, relate, release. Visit: www.lsdservers.boards.net
04.12.14 01:47:18 pm
Up
DC
Admin
Offline Off
Okay I see.. yes, that's true of course. At least if you can't be sure that the input is well-formed / as expected - which obviously is the case when using direct unchecked user input.

You could add simple error checking as well. Just do the changes I explained above and add these conditions to the loop:
if bracketDepth > 1 then MALFORMED_BRACKETS_ERROR ("(" used after still unclosed "(", a ")" is expected)
if bracketDepth < 0 then MALFORMED_BRACKETS_ERROR (")" without preceeding unclosed "(")
www.UnrealSoftware.de | www.CS2D.com | www.CarnageContest.com | Use the forum & avoid PMs!
11.12.14 04:11:41 am
Up
Lee
Moderator
Offline Off
Here's another algorithm that computes what you asked for:

Code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- Parse takes an action table t and computes the fixed point of this grammar + string
function parse(t)
    if t:terminate() then return end
    local peek = t.hash(string.char(t.stream:byte(t.at)))
    t.at = t.at + 1
    t[peek](t)
    return t
end

-- Stack of characters consumed grouped by bracketing level
local stack = {0}
-- Maximum number of characters to parse before breaking
local max_char = 10

function push() table.insert(stack, 1) end
function pop() assert(#stack > 1, 'Dangling )'); table.remove(stack, #stack) end
function count() stack[#stack] = stack[#stack] + 1 end

trace = parse {
    -- This is the string of character we want to parse
    stream = "1234(xx()X(x))5678()90~~~~~~~",
    -- At the end of the run, at will point to the end of the 10 unbracketed characters
    at = 1,
    -- This is a function that maps each character to an action label (so '(' = left, 'r' = right, etc)
    hash = function(char) return ({['('] = 'left', [')'] = 'right', [''] = 'null'})[char] or 'character' end,
    -- What to do when we encounter a normal character
    character = function(t) count(); parse(t) end,
    -- What to do when we encounter a left parenthesis
    left = function(t) push(); parse(t) end,
    -- What to do when we encounter a right parenthesis
    right = function(t) pop(); parse(t) end,
    -- What to do when we encounter the end of the stream
    null = function(t) --[[Reduce stack]] assert(#stack == 1, 'Dangling (') end,
    -- The condition for us to terminate
    terminate = function(t) return stack[1] >= max_char end,
    -- What was parsed/consumed in this trace
    consumed = function(t) return t.stream:sub(1, trace.at - 1) end}

print(trace:consumed())


This is created from a BNF grammar of the language of matching parenthesis. This generalizes nicely, and if you want to compute other properties of these types of string, you can change the action associated with each character class from push/pop/count to other functions because you effectively have the entire tree corresponding to the bracketing structure of the string.
11.12.14 03:07:25 pm
Up
Flacko
User
Offline Off
I still don't get what was wrong with the balanced bracket matching (%b), after tinkering with it for a while I got this (I must admit that my algorithm-fu is not the strongest):

Code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
function weird(str, n)
     --remember the position of the last right bracket
     local lastright = 0
     --find the next pair of balanced brackets
     local left, right = str:find("%b()")
     while left ~= right do
          --if there is a bracket between lastright and the next pair, the brackets are unbalanced
          if str:find("[%(%)]", lastright+1) < left then
               print("unbalanced brackets")
               return nil
          end
          --if we can return in this section, do so
          if left - lastright > n then
               return str:sub(1, lastright + n)
          end
          --decrease remaining characters
          n = n - (left - lastright - 1)
          lastright = right
          left, right = str:find("%b()", right)
     end
     if str:find("[%(%)]", lastright+1) then
          print("unbalanced brackets")
          return nil
     end
     if str:len() - lastright > n then
          return str:sub(1, lastright + n)
     end
     return str
end

Returns at most n characters outside brackets.
It should work with nested parentheses and should error when it finds an unbalanced pair of brackets that could affect the output.
edited 2×, last 12.12.14 03:56:27 pm
To the start Previous 1 Next To the start